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PREFACE 


Establishing meaningful links across biological and cultural lines of 
evidence constitutes the core objective of research on human evolution, 
as this process enables the understanding of the complex environmental 
factors driving hominin behavioral adaptations. Given the multifaceted 
nature of human behavior, deciphering the course of its evolution is 
impossible without interdisciplinary research relying upon the integra- 
tion of different — yet complementary — scientific fields, including 
archaeology, biological anthropology, primatology, linguistics, and 
paleogenetics. However, due to the natural complexity of synthesizing 
evidence from such diverse methodological frameworks, multidiscipli- 
nary approaches to reconstructing human behavior in past humans are 
still scarce. 

One of the fundamental objectives of our DFG Center for Advanced 
Studies “Words, Bones, Genes, Tools: Tracking linguistic, cultural and 
biological trajectories of the human past” is to establish a proper collabo- 
rative framework for such multidisciplinary research. This edited vol- 
ume, entitled “Biocultural Evolution: An Agenda for Integrative 
Approaches,” collects the proceedings of the Center’s sixth annual sym- 
posium, which took place in Tübingen on December 3"'-4", 2021 (in a 
hybrid format). In total, more than 30 international scholars participated 
in this hybrid event (either online or in-person), representing a diverse 
spectrum of scientific fields. This volume is composed of nine chapters 
corresponding to most of the research presented at the symposium, 
including both original research papers as well as critical methodological 
reviews. These contributions are grouped into three thematic units, 
focusing on the methodological foundations for reconstructing habitual 
physical activity in the past (Chapters 1—3), the potential cognitive under- 
pinnings of stone tool use in extant humans and nonhuman primates 
(Chapters 4—6), and key aspects of human linguistic evolution (Chap- 
ters 7—9). 
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The volume begins with an introductory perspective piece by Co-Edi- 
tor Fotios Alexandros Karakostis, asking whether “...humans only do 
what they are good at?” (Karakostis, this volume). This chapter high- 
lights the conceptual difference between evolved functional adaptations 
and reflections of daily behavior in the fossil hominin record (associated 
with phenotypic plasticity), arguing that this distinction is often unclear 
in the hypotheses and interpretations of anthropological studies (for 
example, those on early stone tool use and Neanderthal manual behav- 
ior). On this basis, Karakostis recommends future studies on hominin 
physical activity to attempt to differentiate among the concepts of “basic 
functional capacity,” “evolved efficiency,” and actual “habitual behav- 
ior,” highlighting that the latter largely depends on environmental and 
cultural conditions. The author concludes with methodological sugges- 
tions for addressing each of these three behavioral components sep- 
arately. 

In Chapter 2, leading anthropologist Jane E. Buikstra provides a dili- 
gent critical review of the methods currently used to reconstruct habitual 
activity based on the morphology of human skeletal remains (Buikstra, 
this volume), relying on the example of a renowned bioarchaeological 
context (i.e., the Phaleron cemetery of Archaic Athens). After reviewing 
a wide range of activity markers, including both pathological and healthy 
bone modifications, the author concludes by highlighting the great poten- 
tial of specific approaches, which involve the use of three-dimensional 
muscle attachment sites (entheses) and long bone cross-sectional geome- 
try. Furthermore, the author underlines the necessity and importance of 
further advancements in the existing methodologies. 

Chapter 3 (Wallace et al., this volume) represents an original experi- 
mental study on laboratory animals (guinea pigs) aiming to elucidate the 
causing factors of osteoarthritis, which is one of the most widely utilized 
skeletal activity markers in anthropological sciences. Their results sug- 
gest that increased physical activity can inhibit the appearance of knee 
osteoarthritis, thus questioning the traditional “wear and tear” assump- 
tion that the presence of this degenerative joint disease in a skeleton may 
directly reflect overall strenuous physical activities. Based on these find- 
ings and those of other studies by the same authors, Wallace and col- 
leagues recommend caution when using knee osteoarthritis as an indi- 
cator of overall physical activity levels. 

Chapter 4 (Bril, this volume) opens the second thematic unit with a 
diligent synthesis of her previous human stone-knapping experiments 
focusing on percussive actions. Through her extensive review, Bril posits 
that a deeper understanding of stone-knapping can be achieved by rely- 
ing on a “bottom-up” functional framework that properly distinguishes 
between the concepts of “technique” (i.e., the physical modalities of 
action) and “method” (i.e., the sequence of actions required to reach a 
goal). The author concludes that, in contrast to common assumptions, the 
techniques required for flake production are complex and their mastery 
requires extensive practice. 
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In Chapter 5 (Motes-Rodrigo and Tennie, this volume), the authors 
rely on original experimental data involving extant primate species 
(chimpanzees and gorillas) to revisit the proposed role of social learning 
in producing and using stone tools at the early stages of hominin evolu- 
tion. Their results showed that nonhuman primates are unable to socially 
learn how to knap from human demonstrations, contradicting the find- 
ings of previous experimental works. The authors suggest that this dis- 
crepancy between studies may likely be due to the important fact that, in 
contrast to previous research, their experiments involved unenculturated 
and untrained individuals. 

The second thematic unit closes with Chapter 6 (Kalan, this volume), 
which focuses on the crucial and highly controversial topic of laterality 
and the evolution of handedness. The author reviews previous experi- 
mental studies on handedness in nonhuman primates, discussing the 
theories that link the evolution of laterality with that of stone tool use and 
language. Subsequently, Kalan proposes an innovative multimodal 
approach for investigating handedness that considers the role of sounds 
and auditory information in stone tool-using behaviors. 

Chapter 7 (Dediu et al., this volume) opens the third thematic unit of 
this volume, which addresses key research questions surrounding lin- 
guistic evolution. Dediu and colleagues address the patterning and evolu- 
tion of dental fricatives, which represent a cross-linguistically rare group 
of consonants that tends to be present in some of the most widely spoken 
languages today. Relying on an innovative approach integrating linguis- 
tic and three-dimensional anatomical data, the authors propose that the 
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diachronic scarcity and geographic patterning of dental fricatives may be 
affected by patterns of anatomical variation in the anterior oral vocal 
tract. 

Chapter 8 (Cathcart, this volume) provides a diligent review of var- 
ious applications of rate variation models in the field of diachronic lin- 
guistics that aim to assess linguistic change. This synthesis also includes 
the discussion of biological models that are often overlooked in linguistic 
literature. Relying on detailed observations, the author proposes a new 
analytical framework for investigating linguistic change, defined as 
“Distributional Phylogenetic Modeling,” reporting the results of original 
research that is currently in progress. 

Finally, in the concluding Chapter 9 (Enfield and Sidnell, this vol- 
ume), the authors focus on a fundamental property of language, which is 
its reflexivity (i.e., its ability to refer to itself). In this perspective piece, 
the authors provide a definition for reflexivity in language and discuss its 
evolutionary implications for modern humans, including its role in key 
aspects of human social behavior and organization. Additionally, in the 
context of the known hypothesis that metalanguage might be a prerequi- 
site for language, the authors suggest the potential evolutionary impor- 
tance of employing repair practices (e.g., the use of the word “Huh?”’). 

We are deeply grateful to all participants of the symposium and con- 
tributors to this volume, whose excellent presentations and chapters led 
to the development of this remarkable collection. We are also extremely 
thankful to the entire organizing committee of the symposium, composed 
of Fotios Alexandros Karakostis, Miri Mertner, Marisa Köllner, Monika 
Doll, Gerhard Jager, and Katerina Harvati. Importantly, we are 
extremely thankful to both the principal investigators of the DFG Centre 
for Advanced Studies “Words, Bones, Genes, Tools: Tracking linguistic, 
cultural and biological trajectories of the human past”, Katerina Harvati 
and Gerhard Jager, for providing the framework and means required to 
organize the symposium and this edited volume. Special thanks are also 
due to Kerns Verlag, as well as all the colleagues that kindly agreed to 
review the chapters published in this volume, for all their meticulous and 
valuable work during this book’s development. We would also like to 
thank all student assistants, volunteers, and members of the DFG Centre 
that helped us with organizing the symposium and volume, including 
Lourdes Gabriela Tamayo Caceres, Kim Apholz, Simona Affinito, Brie 
Eteson, Elena Moos, Julia Zastrow, and Alessio Maiello. The funding 
required for the conference and the volume was provided by the German 
Research Foundation (Deutsche Forschungsgemeinschaft), in the frame- 
work of the DFG-Kollegforschergruppe Center for Advanced Studies 
“Words, Bones, Genes, Tools” (DFG FOR 2237). Finally, we are deeply 
thankful to our families and colleagues for their care, patience, and sup- 
port. 


Fotios Alexandros Karakostis, Gerhard Jager 
February 2023 
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CHAPTER ONE 


“Do humans only do what they are good at?” 
Distinguishing between daily behaviors and evolved 
functional adaptations in fossil hominins 


Fotios Alexandros Karakostis'?? 


Abstract 


In the introductory chapter of this edited volume, | argue that paleoanthropologi- 
cal research on hominin behavioral evolution tends to overlook the conceptual 
distinction between a species’ basic anatomical capacity to carry out a certain 
physical task (e.g., the ability to climb), its evolved biomechanical efficiency in 
performing that activity (e.g., arboreal climbing efficiency), and each individual's 
habitual physical activities (e.g., frequency and intensity of climbing throughout 
life). Using a few key examples from the literature, | posit that the lack of this theo- 
retical distinction can compromise the integrity of paleoanthropological hypoth- 
eses and interpretations surrounding hominin biocultural evolution. Lastly, this 
chapter encourages future evolutionary studies to always strive to address all 
three behavioral components (capacity, evolved efficiency, and habitual behav- 
ior), relying on appropriate methods (and morphological traits) for each of them. 


INTRODUCTION 


Reconstructing habitual behavior in the past comprises a major objective 
of archaeological sciences centered on bio-cultural evolution. One of the 
most fundamental components of a population’s behavior involves its 
daily bodily activities, which form the physical expression of its cultural 
and subsistence practices in response to the dynamic conditions of the 
surrounding environment. Essentially, reconstructions of physical activ- 
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ity attempt to answer the fundamental question of “What did people do 
in the past?”, seeking to piece together mild and fragmentary reflections 
of hominin daily life, subsistence strategies, social hierarchy, symbolic 
behavior, and cultural practices. The importance of this difficult objec- 
tive, which is sometimes referred to as the “Holy Grail of 
Bioarchaeology” (Jurmain et al. 2011), has led to the emergence of an 
entire field of evolutionary research, involving a plethora of proposed 
theoretical concepts, methods, and techniques (e.g., Karakostis and 
Harvati 2021; Kivell 2016; Wallace et al. 2017). Naturally, reconstruct- 
ing human behavior based on dry skeletal remains is always an arduous 
endeavor due to the inherent complexity and diversity of human behav- 
ior, combined with the multifactorial etiology of bone morphological 
variation (Pearson and Lieberman 2004; Ruff et al. 2006; Schrader 2019; 
Wallace et al. 2017). 

In evolutionary anthropology, numerous previous studies have relied 
on reconstructions of physical activity to address some of the most funda- 
mental questions on human evolution, such as the emergence of obligate 
bipedal locomotion in early hominins (e.g., Daver et al. 2022; Richmond 
and Jungers 2008), the earliest evidence of habitual stone tool use (e.g., 
Karakostis et al. 2021; Kivell 2015; Marzke 1997, 2013), behavioral and 
cultural differences across recent hominins (e.g., Neanderthals and mod- 
ern humans; see Bardo et al. 2020; Karakostis et al. 2018; Maki and Trin- 
kaus 2011; Niewoehner 2006; Pearson et al. 2006), the tool-making abil- 
ities of enigmatic fossil hominins (i.e., Homo naledi, Homo floresiensis, 
and Homo luzonensis; see Détroit et al. 2019; Kivell et al. 2015; Tocheri 
et al. 2007), or the proposed emergence of key aspects of behavioral 
modernity in Homo sapiens, including division of labor, greater environ- 
mental adaptability, and the production of sophisticated artifacts (e.g., 
see Estalrrich and Rosas 2015; Karakostis et al. 2020; Niewoehner 
2001). 

Typically, most anthropological studies addressing these major evolu- 
tionary questions have relied on comparing skeletal functional morphol- 
ogy across diverse hominin species and comparative samples of extant 
species (e.g., Dunmore et al. 2020; Kivell 2015; Marzke 2013; Marzke et 
al. 2010). Such morphological markers of activity are associated either 
with species-wide evolutionary adaptations to increased biomechanical 
efficiency (e.g., bone shape and joint configurations that allow efficient 
climbing locomotion or dexterous “in-hand” manipulation of objects) or 
lifetime alterations in bone morphology due to phenotypic plasticity and 
biomechanical loading history (Ruff et al. 2006). The latter mainly 
involve variation in long bone cross-sectional geometry, trabecular mor- 
phology, pathological lesions potentially associated with intense physical 
stress (e.g., osteoarthritis), and the morphology of muscle attachment 
sites on the bone surface (or “entheses”) (e.g., Kivell 2016; Schrader 
2019; Stock and Shaw 2007; Wallace et al. 2017). The basic premise of 
reconstructing activity based on most of these activity markers relies on 
the broad concept of “bone functional adaptation”, according to which 
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bone form (size and shape) is expected to adapt to altered physical sti- 
muli both before and after adulthood (Pearson and Lieberman 2004; Ruff 
et al. 2006). 

Even though the reliability of many of these activity markers has been 
often questioned due to (mainly) their multifactorial etiology and the lack 
of supporting laboratory evidence (Schrader 2019; Wallace et al. 2017; 
Wallace et al. 2022), some of them have been repeatedly validated 
through extensive experimental work (e.g., Lieberman et al. 2004; Shaw 
and Stock 2009; Wallace et al. 2022). This includes a method I intro- 
duced in 2016 (Karakostis 2015; Karakostis and Lorenzo 2016), recently 
named the “Validated Entheses-based Reconstruction of Activity” 
(VERA) approach (see literature reviews by Karakostis 2022, and Kara- 
kostis and Harvati 2021), whose reliability has been supported in several 
studies involving diverse laboratory animals (e.g., Castro et al. 2022; 
Karakostis, Jeffery, et al. 2019, Karakostis, Wallace, et al. 2019; Kara- 
kostis and Wallace 2023) as well as human skeletons with a universally 
unique level of long-term occupational documentation (Karakostis et al. 
2017; Karakostis and Hotz 2022). In this volume, a detailed critical 
review of most of the above activity markers has been provided in the 
chapter of this book authored by Dr. Jane Buikstra. 

This chapter posits that the conceptual distinction between habitual 
physical activity and evolutionary functional adaptation is often unclear 
in the paleoanthropological literature centered on hominin biocultural 
evolution. In contrast, the above skeletal markers of “habitual activity” 
and “biomechanical efficiency” are typically lumped together as vague 
indicators of human behavior. This misconception, however, implies that 
an individual’s daily living conditions and activities directly reflect its 
species’ evolutionary history, thus underestimating the crucial effects of 
environmental and/or cultural factors on an individual’s behavior. Here, I 
argue that this practice directly compromises the reliability of our 
hypotheses and interpretations surrounding the evolution of hominin 
behavior. This commentary paper, which forms the introductory chapter 
of this edited volume, is divided into three main sections: 


1. Habitual activity versus evolved efficiency 

2. Fake it until you make it? Habitual tool use versus tool-using 
dexterity in early hominins 

3. Neanderthals: Strong yet precise? 


HABITUAL ACTIVITY VERSUS EVOLVED EFFICIENCY 


By inferring hominin daily physical behavior based on skeletal markers 
of species-wide biomechanical efficiency (e.g., manual dexterity), it is 
effectively implied that hominin individuals habitually performed what 
they were best adapted for (from birth). This viewpoint blatantly over- 
looks the existence of ecological constraints that regulate the conditions 
of a population’s fitness within an everchanging environment, which are 
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bound to vary greatly in time and space throughout a species’ evolution- 
ary history. Moreover, in an evolutionary context, it seems unlikely that 
the efficient performance of an activity can be naturally selected before 
this behavior is even practiced at all (at least to some degree) by a species 
or population that had not yet developed such efficiency (with the excep- 
tion of behaviors that depend on functional traits resulting from exapta- 
tion). Furthermore, using evolved traits to reconstruct habitual activity 
also underestimates the weight of cultural variability that may often be 
unrelated to the biomechanical constraints of our evolutionary history. 
This is especially the case for the more recent and larger-brained homi- 
nins (e.g., Neanderthals and early modern humans), who have been asso- 
ciated with sophisticated cultural practices (Villa and Roebroeks 2014). 

The importance of conceptually distinguishing between skeletal 
markers of habitual activity (associated with phenotypic plasticity) and 
biomechanical efficiency (resulting from evolutionary adaptation) is 
graphically demonstrated in the example of Figure 1 (more information 
provided in its legend), which focuses on arboreal climbing behaviors in 
chimpanzees and modern humans (Fig. 1). As presented in that graph, 
humans and chimpanzees share the basic functional capacity for per- 
forming arboreal climbing (represented by the left circle of the figure). 
Nevertheless, due to a series of evolutionary species-wide adaptations, 
chimpanzee bodies are much more efficient in climbing trees (the high- 
lighted area of the left circle). If this comparative framework was used to 
infer daily behavior in individuals of these species, the conclusion would 
be that no modern human is habitually involved in arboreal climbing. 
However, as shown in the examples of the two bottom pictures, there are 
humans who habitually climbed throughout their lives (and cases of 
chimpanzees that never did; see Wallace et al. 2020). The behaviors 
depicted in these examples are in line with the specific individuals’ envi- 
ronmental contexts and demands (represented by the right circle of the 
figure). 

Evidently, these daily physical practices are not expected to silence 
the presence of genetically determined anatomical traits associated with 
climbing efficiency in a species (e.g., the occurrence of curved phalanges 
in chimpanzees), as previously demonstrated (e.g., Wallace et al. 2020). 
These habitual activities, however, might influence the morphology of 
bone traits affected by biomechanical stress throughout life (see section 
above), including internal bone structures and entheseal 3D surface mor- 
phology (e.g., Karakostis et al. 2019; Karakostis and Harvati 2021; 
Kivell 2016; Macintosh et al. 2017; Shaw and Stock 2009; Wallace et al. 
2017). 


“FAKE IT UNTIL YOU MAKE IT?": HABITUAL TOOL USE VERSUS TOOL- 
USING DEXTERITY IN EARLY HOMININS 


The origins of human-like stone tool use represent one of the most crucial 
research questions in human evolutionary sciences. Traditionally, habit- 
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Example: Arboreal Climbing 
Capacity: Pan and Homo 
Evolved Efficiency: Pan > Homo 
Habitual Behavior depends on Conditions 


ual stone tool use had been broadly associated with the emergence of the 
genus Homo. However, this viewpoint has now shifted due to the recent 
discovery of proposed stone tool industries predating our genus 
(Harmand et al. 2015) and the identification of australopithecine hand 
skeletons bearing anatomical indications of increased manual dexterity 
(e.g., Dunmore et al. 2020; Kivell 2015; Kivell et al. 2018). For instance, 
Australopithecus sediba and—to a lesser degree—Australopithecus afa- 
rensis exhibit a thumb that is proportionally much longer than in chim- 
panzees, arguably facilitating the efficient performance of human-like 
precision grips involving interactions between the thumb and the fingers 
(Kivell et al. 2018). Relying on such observations, previous research sug- 
gested that these species may have been the producers and users of early 
stone tool industries (Kivell 2015). Based on a similar theoretical princi- 
ple, a previous biomechanical modeling study proposed that 
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Fig. 1. 

Above: Graphic summary of 
the distinction between the 
three components of human 
behavior discussed in this 
chapter, defined as “Capacity” 
(both anatomical and cogni- 
tive), “Efficiency” (evolved 
species-wide functional adap- 
tations), and “Habitual 
Behavior” (physical activities 
frequently and/or intensively 
performed by an individual). 
Each individual has the basic 
ability to perform an extensive 
range of physical tasks (left 
circle representing 
“Capacity”). Nevertheless, 
some species are better 
adapted for some of these 
activities compared to others 
(lower area of left circle, rep- 
resenting “Efficiency”). 
Depending on the environmen- 
tal and cultural conditions of a 
geochronological context (right 
circle), the “Habitual 
Behavior” of an individual (or 
population) may potentially 
include daily activities for 
which its species is not best 
adapted (in comparison to 
other species). 

Below: When focusing on the 
simple example of arboreal 
climbing (lower part of the 
graph), both humans and chim- 
panzees show the ability to 
carry out this physical task, 
with chimpanzees being more 
efficient (due to extensive evo- 
lutionary functional adapta- 
tions). Nevertheless, there are 
modern humans known to 
habitually climb trees through- 
out their lifetime (bottom right 
picture, freely accessible at 
the website www.pexel.com) 
and chimpanzee individuals 
that never climbed a single 
tree (bottom left picture, 
derived from a panel of a figure 
previously published in 
Wallace et al. 2020; License: 
CC BY-NC-ND 4.0; https://cre- 
ativecommons.org/licenses/by- 
ne-nd/4.0/). 
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Australopithecus afarensis was likely not the producer of the oldest pro- 
posed stone tools (the Lomekwian industry) because of its fifth digit’s 
low force-producing efficiency, which might have prevented it from 
habitually manipulating the large Lomekwian cores and tools (Domalain 
et al. 2017). 

However, despite the great importance of these functional adaptations 
and their crucial implications for the biomechanical evolution of hominin 
dexterity, the etiology of these traits seems to be unrelated to phenotypic 
plasticity and daily manual behavior. As summarized in Figure 2, all 
hominins seem to have been capable of human-like precision grasping, 
regardless of the suggested comparatively low efficiency of some homi- 
nins (e.g., Australopithecus) in opposing their thumb and fifth rays 
(Domalain et al. 2017; Hopkins et al. 2002; Karakostis et al. 2021; 
Marzke et al. 1999). Assuming that the cognitive capacity of a fossil 
hominin species would have permitted its individuals to mentally con- 
ceive the production of stone tools, and that using tools would have been 
both feasible and beneficial within a particular environmental context, it 
seems nonsensical that these individuals would decide not to use stone 
tools because their hands were comparatively less dexterous than those 
of another hominin species (i.e., our comparative samples in an analysis). 
In fact, considering that early hominins were probably fully capable of 
precise manipulation (such as all extant great apes; Hopkins et al. 2002), 
it seems very unlikely that they would only start using stone tools after 
their manual efficiency gradually increased. In contrast, unless hominin 
tool-using behaviors arose and evolved exclusively through exaptation, it 
seems more plausible that the adaptive value of tool-using dexterity grew 
in hominins who were already using stone tools (at least to some degree) 
and whose fitness could thus directly benefit from higher manipulatory 
efficiency (see discussion in Kunze et al. 2022). 

In two recent studies (Karakostis et al. 2021; Kunze et al. 2022), my 
collaborators and I addressed early hominin biomechanical efficiency 
and habitual behavior separately, using different methods for each of the 
two behavioral components. In Karakostis et al. (2021), we relied on an 
integrative biomechanical modeling approach for directly calculating 
thumb opposition efficiency considering the effects of both the missing 
muscle architecture and bone morphology. The results showed that the 
earliest proposed tool-using species, including Australopithecus sediba, 
showed a relatively low thumb opposition dexterity. In fact, when our 
models assumed a chimpanzee-like muscle force-producing capacity, the 
thumb dexterity of that species was similar to that of extant chimpanzees. 
Nevertheless, the application of the method VERA on the hand entheses 
of Australopithecus sediba indicated the habitual use of a muscle that is 
essential for human-like stone tool use (Kunze et al. 2022), while another 
study showed that its thumb’s trabecular morphology was found to be 
consistent with human-like manipulation (Dunmore et al. 2020). Alto- 
gether, these findings suggest that the individual representing this homi- 
nin species may have frequently performed human-like hand grips 
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Example: Human-like Precision Grasping 
Capacity: Possibly all hominins 
Evolved Efficiency: Homo > Australopithecus 
Habitual Behavior: Human-like manipulation in Australopithecus: 
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regardless of its comparatively low thumb dexterity (Kunze et al. 2022; 
also see plot at the bottom of Fig. 2). These opposing trends between 
reconstructed efficiency and habitual behavior in Australopithecus 
sediba highlight the importance of distinguishing between these two 
notions (and associated analytical methods) when approaching the evolu- 
tion of hominin behavior. 


NEANDERTHALS: STRONG YET PRECISE? 


Another interesting example relates to the manual behaviors of 
Neanderthals. In comparison to modern humans, Neanderthal hand 
bones tend to be more robust and bear more pronounced muscle attach- 
ment sites (Niewoehner 2006). Moreover, their thumb presented a rel- 
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Fig. 2. 

Above: Graphic summary of 
the distinction between the 
three components of human 
behavior discussed in this 
chapter (see legend of Fig. 1). 
Below: In this example, it is 
argued that all hominins were 
probably capable of perform- 
ing human-like thumb-based 
precision grasping (left circle), 
similarly to extant great apes. 
However, according to biome- 
chanical modeling research 
(Karakostis et al. 2021), it 
seems that the thumbs of the 
genus Homo were clearly more 
dexterous than those of aus- 
tralopithecines. Nevertheless, 
as shown in the bottom princi- 
pal component analysis plot 
(previously published open 
access in Kunze et al. 2022), 
the analysis of 3D entheseal 
surfaces (using the VERA 
method) identified habitual 
human-like manipulation pat- 
terns in certain australopithe- 
cines (Kunze et al. 2022), in line 
with analyses of thumb trabec- 
ular structures (Dunmore et al. 
2020). 
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atively (slightly) shorter length, different proportions between the two 
adjoining phalanges, and a basal (trapeziometacarpal) joint configuration 
consistent with less pollical flexibility in some respects (Churchill 2001; 
Niewoehner 2001, 2006; Niewoehner et al. 2003). Based on these 
anatomical indications, previous studies hypothesized that the 
Neanderthal lifestyle predominantly relied on transverse power-grasping 
motions, which were in turn associated with the use of composite tools 
and hafting (Niewoehner 2006). More recently, this hypothesis of habit- 
ual use of hafted tools received additional support from a biomechanical 
study focusing on the thumb’s basal joint (Bardo et al. 2020), which 
reported that Neanderthal thumbs were better adapted for performing 
thumb extension (1.e., a central component of power grasping) compared 
to other hand movements (e.g., the opposition of the thumb for human- 
like precision grasping). 

However, it could be argued that the above anatomical configurations 
are of genetic origin, occurring across Neanderthal skeletons from 
extremely diverse geochronological and environmental contexts (Nie- 
woehner 2006). Furthermore, previous biomechanical modeling studies 
demonstrated that Neanderthal hands were perfectly capable of perform- 
ing modern human-like precision grasping involving small objects (Feix 
et al. 2015; Karakostis et al. 2021; Niewoehner et al. 2003). On this basis, 
by hypothesizing that the above anatomical traits can be used to recon- 
struct daily manual activity in Neanderthals, it is indirectly—yet 
clearly—assumed that their presumed power-grasping adaptations would 
lead them to systematically prefer power grips even in environmental 
contexts necessitating precision grasping (for which they were anatomi- 
cally capable). This hypothesis contradicts experimental studies suggest- 
ing that producing and manipulating the vast majority of small stone 
tools commonly associated with Neanderthal contexts (e.g., Mousterian 
flakes) would primarily require the use of thumb-index precision grasp- 
ing (Key and Lycett 2018). In contrast, direct evidence of hafting (pre- 
sumably associated with transverse power grasping) remains extremely 
scarce across Neanderthal contexts, while being entirely absent in many 
Neanderthal sites with rich lithic assemblages (Claud et al. 2019; Nie- 
woehner 2006). Finally, there is an increasing body of archaeological 
evidence suggesting that Neanderthals were engaging in various prac- 
tices requiring a high level of manual precision, such as the production of 
specialized bone tools, making of cordage, and the use of tar (Hardy et al. 
2013; Soressi et al. 2013; Villa and Roebroeks 2014) 

In a previous study (Karakostis et al. 2018), my collaborators and I 
sought to address this controversy between the above-mentioned findings 
of biomechanical modeling (showing that Neanderthals had the capacity 
to precisely manipulate small objects), archaeological evidence (involv- 
ing the use of predominantly “microlithic” tools and precise manual 
practices), and functional indications of efficiency (showing that Nean- 
derthal hands may have been better adapted for transverse power grasp- 
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Example: Neanderthal grasping 
Capacity: Both power and precision grasping 
Evolved Efficiency: Better adapted for power grasping? 
Habitual Behavior: Evidence of precise manipulation in 
Neanderthal biological and cultural remains 


Pca 
(26% of total variance) 


ing). This was attempted through the application of my experimentally 
validated VERA method (Karakostis 2015, 2022) on a diverse compara- 
tive sample of Neanderthals, early modern humans, and recent individ- 
uals with extensively documented life histories. The results showed clear 
evidence of habitual thumb-index precision grasping in Neanderthals, 
contradicting the traditional viewpoint that Neanderthals habitually 
relied on power grasping (reflected on their high bone robusticity), and 
reflecting the latest archaeological indications on the daily manual beha- 
vior of this species. Figure 3 summarizes the proposed distinction among 
Neanderthal capacity, efficiency (dexterity), and habitual behavior for 
precision grasping behaviors. 
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Fig. 3. 

Above: Graphic summary of 
the distinction between the 
three components of human 
behavior discussed in this 
chapter (see legend of Fig. 1). 
Below: This example argues 
that Neanderthals are consid- 
ered to have been anatomi- 
cally capable of performing 
both power and precision 
grasping, while some studies 
reported that their hands were 
better adapted (more efficient) 
for transverse power grasping 
motions (e.g., Niewoehner 
2006; Bardo et al. 2020). 
However, previous research 
on their hand 3D entheses rely- 
ing on an experimentally val- 
idated approach (VERA; 
Karakostis 2015, 2022; 
Karakostis and Lorenzo 2016) 
found clear evidence of habit- 
ual precision grasping (see 
bottom plot, previously pub- 
lished open access in 
Karakostis et al. 2018). This 
result fits well with experimen- 
tal evidence on the grips 
required to produce and 
manipulate the — predomi- 
nantly — small stone tools 
associated with Neanderthals 
(Key and Lycett 2018), while 
also reflecting the latest 
archaeological indications 
regarding Neanderthal manual 
behavior (see information and 
citations in text). 
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CONCLUDING REMARKS 


The examples provided in this chapter posit that anthropological hypoth- 
eses and interpretations surrounding fossil hominin behavior could 
greatly benefit from properly distinguishing among the fundamental con- 
cepts of broader anatomical capacity, evolved biomechanical efficiency, 
and habitual physical activity (Figs. 1 to 3). This distinction can be facili- 
tated by the selection of appropriate methods for addressing each of these 
three basic components of physical behavior. Anatomical capacity and 
efficiency can be best evaluated based on biomechanical modeling tech- 
niques, which can be used to directly assess the functional significance 
of genetically determined morphological traits (e.g., species-wide overall 
bone robusticity trends, joint configuration/orientation, or consistent pro- 
portions among different bone elements), while also attempting to 
account (as much as possible) for the potential influence of the missing 
soft tissue (e.g., Karakostis et al. 2021). However, reconstructing habitual 
physical activities of an individual based on its species-wide evolved 
functional traits is highly misleading, as it blatantly overlooks the 
dynamic environmental and/or cultural factors affecting daily hominin 
behavior and subsistence strategies within each species. In contrast, that 
objective can be more reliably addressed by studying bone structures that 
are experimentally shown to reflect biomechanical loading history 
throughout life (for a more detailed perspective, see this book’s dedicated 
chapters by Jane Buikstra and Ian Wallace). Any resulting indication on 
habitual physical activity should always be interpreted on the basis of 
each species’ possible constraints (anatomical and cognitive capacity) 
and evolved biomechanical efficiency (e.g., level of dexterity), focusing 
on adequately reconstructed paleoenvironmental and/or cultural contexts 
(Fig. 1). 
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CHAPTER TWO 


Inferring ancient human activities through skeletal 
study: An example from Archaic Greece 


Jane E. Buikstra! 


Abstract 


This paper critically evaluates the range of skeletal attributes available for identi- 
fying activity patterns in the past. Our investigation is contextualized in the study 
of Phaleron, an Archaic Greek site where non-elite individuals were buried. We 
consider the following skeletal attributes: fractures, osteoarthritis, entheseal 
remodeling, bone cross-sectional diameters, bone density, and trabecular pat- 
terning as possible sources of information, finding that entheseal remodeling and 
long bone shape and density imaging hold the greatest potential. Studies of frac- 
ture patterning will also be employed to explore the risk of interpersonal violence 
and accidental bone breaks. 


“Then let thy forest-feller cut thee all 
Thy chamber fuel, and the numerous parts 
Of naval timber apt for shipwrights’ arts.” 
Hesiod [~700 BCE] in Chapman (1875: 234) 


INTRODUCTION 


Contextualized in an archaeological case study, this paper addresses the 
challenges of interpreting ancient behavior from human remains. Our 
ancient subject is the vast Phaleron cemetery (ca. 700-480 BCE), which 
contained over 1600 remains of people who lived during a socially, polit- 
ically, and economically transformative period, just as Greek democracy 
was emerging along with the consolidation of the Greek State 
(Chryssoulaki 2019a, 2019b). This Archaic period (ca. 800-480 BCE) 
has largely been characterized in terms of elites—archaeologically, his- 
torically, and epigraphically. By contrast, the burials at Phaleron appar- 
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ently represent those whose lives and identities at death had been margin- 
alized from the elite core and are as yet unknown. 

Characterizing the people of Phaleron depends upon analyses of the 
archaeological remains, primarily the skeletons themselves, their burial 
contexts, and associated material culture. One important aspect of this 
bioarchaeological research is the interpretation of ancient activities, 
including those that reflect occupational specializations. Following brief 
discussions of the Phaleron archaeological context and historical sources 
that consider occupations from this period, the paper engages in a review 
of the current status of skeletal methods appropriate for inferring activity 
at Phaleron, including antemortem and perimortem fractures, entheseal 
changes, bone cross-sectional diameters, and joint degeneration. Among 
the methods based upon entheses, the “Validated Entheses-based Recon- 
struction of Activity” (VERA) method (Karakostis and Lorenzo 2016; 
see dedicated review by Karakostis and Harvati 2021) receives special 
emphasis due to the rigor of this approach. We conclude with recommen- 
dations for best practices in the study of activity and occupation at 
Phaleron. 


Occupations during the 1* Millennium BCE. 


Attica’s immediate post Palatial periods (1050+ BCE) are commonly 
characterized by historians and archaeologists in terms of myriad agricul- 
tural communities arrayed across a variegated landscape. We, therefore, 
assume that most activities were agrarian or in support of peasant life- 
ways. By the early decades of Archaic times, however, there is written 
evidence of non-agrarian occupational specialization. Extracted from 
Hesiod’s Works and Days, the above epigram reflects the poet’s experi- 
ence in his agricultural community of Ascra in ancient Boetia, near the 
base of Mt. Helicon. As he advocated for diligence at the dawn of the 
Archaic period, Hesiod identified several non-agrarian occupational spe- 
cialties, including bard, beggar, “maker,” potter, blacksmith, and car- 
penter (Davies 2018: 59). Specialists in arboriculture and shipbuilding, 
as well as those who sailed the ships, are also implied. 

Shifting our perspective forward to the Classical period (500-250 
BCE), immediately postdating Archaic times, Harris (2001) recorded a 
surprising (to Harris) count of 170 occupations, ranging across spe- 
cialties in food production, retail sales, various services, finances, the 
plastic and performing arts, and education. Harris (2001) emphasized 
marked horizontal differentiation in social organization, without signifi- 
cant vertical, administrative specializations. Important for bioarchae- 
ological efforts to reconstruct occupational diversity, these activities 
range from heavy laborers, such as stone-cutters, miners, and butchers, to 
those with lighter physical demands, such as barbers, speechwriters, or 
harp players. This listing does not include the military, athletes, and avo- 
cations of the elite classes. Similarly, it does not speak to the proportions 
of people engaged in specific tasks. The development of professions has 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


Inferring ancient human activities through skeletal study 


thus been identified during the Classical period, hinting at earlier origins 
(Stewart et al. 2020). 

Hesiod’s short list and Harris’s extensive catalog bracket the Archaic 
period (ca. 800-480 BCE), the subject of this essay. Here we explore 
methods for identifying the activities associated with our ancient people; 
the written record has largely passed the Archaic period by. Both authors 
identify occupations requiring heavy labor, Hesiod’s blacksmiths and 
Harris’s miners, which contrast with their bards and harp players. We 
may therefore surmise that a wide range of physical demands faced the 
ancient Archaic communities from which our Phaleron burials were 
drawn. 

Our analytical question now becomes how to characterize the people 
of Phaleron in relationship to the range of occupational specialization 
defined in the centuries bracketing the Archaic period. What can we say, 
and how accurate can we be? Are we able to distinguish fractures attrib- 
utable to interpersonal violence from those that result from nonviolent 
activities? What attributes accurately identify perimortem trauma, as 
opposed to postmortem bone alterations? Can we identify those engaged 
in heavy labor, compared to those in less demanding daily behaviors? 
Can we proceed further, with at least a few occupations associated with 
distinctive suites of behaviors? Which skeletal attributes are the most 
reliable for characterizing bone-altering behaviors, having been tested in 
other, documented contexts? Are we able to identify (groups of) adoles- 
cent-young adult individuals whose daily lives suggest that occupations 
are being defined through apprenticeship at an early age? The goal of this 
paper is to evaluate methods commonly used by bioarchaeologists for 
identifying either severe episodic or repetitive behaviors that assist in 
characterizing the daily lives of the people of Phaleron. Let us first turn 
briefly to the archaeological record and the site of Phaleron itself. 


Phaleron 


Overviews of the Phaleron site and its excavation have been provided by 
Dr. Stella Chryssoulaki, the excavator (2019a, 2019b). Under excavation 
between 2012 and 2020, the site has produced approximately 1960 
human remains, thus far including the highly visible “Esplanada Mass 
Graves” (Ingvarsson and Bäckström 2019), which were left in situ for 
display. These biaiothanatoi, considered to have suffered violent deaths, 
have been grouped with other mass and non-normative graves to form a 
distinctive desmotes burial form or “D-Group.” A minority of adults were 
buried in cist tombs or were incinerated in pyres; juveniles were gen- 
erally interred within jars. Most of the adult individuals were interred in 
simple pit graves, without associated material culture. The Phaleron 
Bioarchaeological Project, centered at the Wiener Laboratory of the 
American School of Classical Studies in Athens, focuses upon the 
approximately 1200 interments from the 2012-2013 season and serves as 
the inspiration for this essay. 
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That the Phaleron site falls within the Hallstatt Plateau, a term for a 
flat, uninformative region on the radiocarbon curve between 800-400 
BCE, means that accurate chronometric resolution via radiometric dating 
during the Greek Archaic period is impossible to challenging, at best 
(Damon 1989; Davis et al. 1992; Jacobsson et al. 2017; Millard 2008; 
van Geel et al. 1998). A further complicating factor is the poor preserva- 
tion of organic portions of bone, which further limits prospects of radio- 
carbon dating. Stratigraphic dating is also challenging, due to the homo- 
geneity of the coastal beach sand and the invisibility of pit outlines and 
other structural details. 

Fortunately, ceramic analyses provide resolution within one to two 
generations—a quarter to a half-century—for many of the contexts. 
Thus, linking those features’ assigned dates through ceramic analyses to 
nearby located graves requires careful archaeological studies of context 
and assumptions about originating levels for pit excavations (Buikstra et 
al. n.d.). 

The goals of the Phaleron Bioarchaeological Project include explor- 
ing the life histories and identities of the individuals interred in the Phale- 
ron Cemetery (Buikstra et al. n.d.; Prevedorou and Buikstra 2019; Preve- 
dorou et al. n.d). We, therefore, marshal osteological information about 
age-at-death, biological sex, stature, dental/skeletal pathology, inherited 
features, and activity indicators to infer social age, gender, health, kin- 
ship, and occupation (see Buikstra et al 2022). The last-mentioned 
includes inferences of extreme and habitual behaviors based upon bony 
evidence, such as fracture patterning, osteoarthritis, entheseal develop- 
ment, and bone-cross-sectional shape. However enticing the prospects, 
this is a speculative domain, subject to considerable critical review (Judd 
2008; Judd and Redfern 2012; Jurmain 1999, Jurmain et al. 2012; Pear- 
son and Buikstra 2006; Wallace et al. 2017a). With a focus on the Phale- 
ron context, we examine the potential of these various lines of evidence 
through a critical, scientific lens. 


ACTIVITY INDICATORS: HOW DO WE CHOSE INDICATORS AND 
METHODS FOR DATA COLLECTION AND ANALYSES? 


Fractures 


Skeletal changes commonly reflect fractures, ranging from extreme, 
acute events to structural failure due to repetitive events, perhaps aggra- 
vated by poor bone quality. It is important that we recognize the speci- 
ficity of each, working from the bony change to the interpretative 
ultimate cause. Certainly, our first task is recognizing, for example, a 
fracture and understanding the proximate cause (Lovell 1997; Walker 
2001). There are many useful guides to this process, including those 
drawn from bioarchaeology and forensic anthropology, including 
Galloway (1999), Lovell and Grauer (2018), Redfern and Roberts 
(2019), Symes et al. (2012), and Wedel and Galloway (2014). That 
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accomplished, the researcher must turn to the next, more challenging 
step: estimating the ultimate cause, given the individual’s life history. 
The medico-legal and biomedical sciences literatures provide many use- 
ful examples that link ultimate causes to ranges of fracture forms, e.g., 
lethal falls from a height (Rowbotham and Blau 2016), or spondylolysis 
with athletics (Syrmou et al. 2010), or the Clay Shoveler’s Fracture (de 
Boer et al. 2016). These clinical sources should be consulted. 


Sharp Force Trauma 


While sharp force trauma is certainly of interest, it is quite rare in the 
Phaleron sample and therefore most amenable to osteobiographical 
approaches to interpretation. Medical knowledge during the Archaic 
period doubtless portended the Hippocrates’ writings, with the treatise 
“On Head Wounds” being especially germane (Hanson 1999). The exam- 
ple illustrated in Figure 1, [V_554 undoubtedly reflects sophisticated and 
effective medical intervention. The person treating the patient apparently 
removed fragments or allowed the wound to suppurate and any frag- 
ments to “float” to the surface. It appears that, in this case, the edges of 
the bony wound were rasped and smoothed, but the bone (inner cortex) 
immediately adjacent to the membrane (dura mater) was left intact. 
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Blunt Force Trauma 


Cranial — methods in use for population approach/temporal change 
Anthropologists studying ancient cranial trauma generally follow Walker 
(1989: 328) in assessing whether cranial trauma observed in a skeletal 
series resulted primarily from 1) accidents, e.g., falls; 2) intentional inter- 
personal violence, ranging from one on one to group interactions; and 3) 
self-inflicted forms. We will concentrate here on distinguishing the first 
two forms. 

The remains of the Phaleron individuals, like those from many other 
parts of the world, e.g., Peru (Arkush and Tung 2013; Scaffidi and Tung 
2020) and California (Walker 1989), present numerous shallow, healed, 
round to ellipsoid examples of blunt force trauma. Methods for distin- 
guishing accidents from various forms of interpersonal violence include 
mapping locations upon the skull to establish patterns. Walker (1989), for 
example, while investigating ancient human remains from the California 
Coast and nearby islands, found concentrations of healed lesions on the 
frontal bone, especially frequent in males. He considered this compelling 
evidence for ritual encounters, rather than raids or warfare, the latter 
more likely being seen in the perimortem lateral (right>>left) and poste- 
rior fractures at the Crow Creek massacre site (Zimmerman et al. 1981). 
Tung (2007) similarly attributed the healed frontal fractures of the elite at 
the Wari La Real site (Majes valley, Peru), which contrasted with the 
anterior and posterior fractures of the Beringa community of non-elite 
individuals, also from the Majes valley. Tung argued that the latter pat- 
tern reflects raiding, supported by the presence of “parry fractures.” This 
approach has been extended temporally by Arkush and Tung (2013), who 
carefully presented alternative scenarios for various kinds of interper- 
sonal violence, ranging from domestic violence to raiding and warfare. 
Scaffidi and Tung (2020), reporting evidence from earlier peoples (200- 
750 CE) from the Majes valley, argued that their elevated rate of facial 
and anterior vault fractures, especially prominent among males, were 
evidence of endemic violence. Obviously, the distribution of healed cra- 
nial fractures, their frequency and patterning, and their age/sex associa- 
tion will be important in interpreting the Phaleron remains. 


Perimortem vs Postmortem 

One of the contemporary challenges faced by bioarchaeologists and 
forensic anthropologists alike is distinguishing perimortem from post- 
mortem blunt force trauma, rendered ambiguous due to taphonomic fac- 
tors, especially soil pressure. As noted above, many of the individuals 
interred in the Esplanada mass graves are said to have been executed with 
clubs or similar weapons (Ingvarsson and Bäckström 2019). The 
methods created and applied by Sala and colleagues (Sala et al. 2015, 
2016) for remains from the Middle Pleistocene site of Sima de los 
Huesos (Atapuerca, Spain) appear promising for resolving this signifi- 
cant problem. Knowledge drawn from forensic anthropology about frac- 
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ture biomechanics, fracture healing rates, and the role of taphonomy in 
altering bone to mimic antemortem and perimortem processes is essential 
in bioarchaeological studies of trauma (Berryman et al. 2018; L'Abbé et 
al. 2021; Pokines et al. 2021; Wedel and Galloway 2014). Sala’s explicit 
focus on fracture outline (straight, curved, depressed), angle (between 
the external surface and the fractured surface), surface (smooth, jagged), 
and cortical delamination (present, absent) appear suitable for the frontal, 
parietal, and occipital bones initially reported for a single individual from 
the Middle Pleistocene Sima de los Huesos (SH) site (Sala et al. 2015). 
The method was slightly modified by the researchers (Sala et al. 2016) in 
a more recent study of 17 SH crania in comparison to other contemporary 
remains. Fracture outlines were scored as either linear, depressed, stel- 
late, or a combination of the latter two, following Wedel and Galloway 
(2014). Trajectory was also added as a category, distinguishing between 
lines that crossed sutures and those that did not. 

In general, Sala and many others focus on calvaria and generally 
adhere to the “hat brim rule” (HBR), which as commonly applied today, 
asserts that injuries at the level of a hat brim are most likely resultant 
from a fall, and those above, from a blow to the head. This definition, as 
promulgated by Kremer and co-workers (Kremer et al. 2008; Kremer and 
Sauvageau 2009) and Guyomarc’h et al. (2010), has been criticized by 
Fracasso et al. (2011) and Geserick et al. (2014), who urge attention to 
the original definition by Kratter (1919) and Walcher (1931). The earlier 
workers argued that fall injuries from a standing position are unlikely to 
appear above the hat brim line, which connects the frontal and parietal 
eminences and the most superior point on the occipital squama, a loca- 
tion defined in a manner distinct from that of the 21*'-century researchers. 
This does not apply to falls from a height, blows, or children’s skulls. For 
the Phaleron analysis, we will emphasize the features discussed by Sala, 
without recourse to the contested HBR criteria. 


Postcranial Fractures 

Given the verticality of the Greek landscape, factoring the effect of acci- 
dental falls into the fracture patterns of those buried at Phaleron will be 
important. Lessa (2011), for example, has argued that the many fractures 
observed in Brazilian coastal pre-colonial groups reflected falls blocked 
by the lower limbs and upper limbs, common due to the rocky coastal 
cliffs adjacent to the shore. Trauma rates were assumed to be associated 
with accidental falls within these challenging locations. Differences 
were, therefore, interpreted as to reflect distinctive lifestyles, contrasting 
fishers with the shellfish-dependent sambaquis-builders. Postcranial 
fractures were interpreted biomechanically by insult type rather than by 
bone. Accidents associated with a vertical landscape can be assumed in 
this case, in the absence of apparent occupational specialization, other 
than perhaps gender-associated (James and Dillon 2012). This contrasts 
with the accident/occupation-based interpretation by Dittmar and col- 
leagues (2021) of three Medieval cemeteries in Cambridge, which dem- 
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onstrated a significantly higher risk of fractures for the poor. The clerics, 
however contemplative they may have been, also apparently suffered 
trauma during their subsistence and maintenance activities. For Phaleron, 
we will want to take deeper dives into the medical and epidemiological 
literature on the nature of fractures from falls, as distinctive from those 
reflecting blows while standing, for example. One might hypothesize the 
patterning for hoplites (foot soldiers) whose faces and limbs were 
exposed outside protective clothing and gear would be distinctive. 
Similarly, the association of equestrian activities with the elite suggests 
that if indeed there are elite among those subject to execution and burial 
at Phaleron, the impact of horseback riding would be found in their 
remains. Linking diagnostic trauma patterns to occupational cues from 
bone shape, entheseal remodeling, and other forms of pathology will 
undoubtedly strengthen analyses and perhaps distinguish land-based 
warriors from those who cultivated the land, who mined its mineral 
riches, and who sailed the seas, as either rowers, merchants, or pirates. 

The need for care in defining postcranial fractures carefully prior to 
attributing cause is writ large in the fraught example of the “parry frac- 
ture,” an eponym too frequently applied to forearm fractures (Judd 2008; 
Jurmain 1999; Lovell 1997; Smith 1996). A very careful definition and 
exemplary cases reported by Judd (2008) illustrate the need to distin- 
guish between forearm fractures caused by indirect force, e.g., falling 
(Colles, Smith’s, Galeazzi, and paired rotational fractures), and directed 
blows to the forearm. Defending a blow with the forearm most com- 
monly fractures the ulna transversely; the radius may also be involved. 
Carefully defining fracture dynamics of direct and indirect causation, 
along with knowledge of contexts for likely falls, accidents, occupational 
stressors, or violence, is essential to interpreting fractures, individually 
and epidemiologically. 


Repetitive Behaviors Reflecting Occupational Specialization 


In the activity-related changes attributed to repetitive activities, we are 
faced with further challenges, as we must first consider the evidence 
linking the skeletal change to activity. If we are seeking evidence of 
occupational specialization, how likely is the dominant risk factor for 
osteoarthritis (OA), for example, to be repetitive behavior? What is the 
range of risk factors for osteoarthritis? How confident can we be that 
activity is the likely causal agent for OA in general and how might this 
vary across body regions? In the case of bone shape changes and enthe- 
seal modifications, we would like to know how long the body needs to 
perform the repetitive activity for it to affect bone and how long will the 
changes persist following cessation of the behavior. As with OA, what 
other factors affect bone shape and entheseal modifications and how 
likely is repetitive behavior to be the primary cause? Finally, is there a 
likely difference in the bony expression—either in form or degree— 
depending upon whether the activity began during adolescence rather 
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than adulthood? Clinical knowledge must be integrated with bioarchaeo- 
logical investigations of activity, as Jurmain (1999; see also Jurmain et 
al. 2012) has so eloquently cautioned, reacting against exuberant 
attempts to infer occupations, following a simple cause and effect model 
for relationships between bony changes and occupations, e.g., Capasso et 
al. (1999). 


“Activity is rarely rigorously defined in the anthropological litera- 
ture. Firstly, is it habitual or exceptional? Although most studies 
tacitly assume that the relevant bone changes result from habitual 
activity, the influence of exceptional, acute behavioral episodes, as 
reflected in traumatic lesions, has occasionally been discussed under 
the broad topic of activity reconstruction (e.g., Jurmain 1999; Walker 
et al. 2009; see Chapter 20 by Judd and Redfern in this volume 
[2012]). In addition, activity needs to be more fully and more clearly 
characterized in terms of duration (total exposure time), frequency 
(number of repetitions per unit time) and mechanical overloading 
(Luttmann et al. 2003). Other factors such as intensity, age of onset, 
and postural demands need to be considered. Ideally, this broadened 
understanding should be concordant with clinical definitions.” 
(Jurmain et al. 2012: 532) 


We, therefore, proceed to consider critically (but optimistically) 
osteoarthritis (OA), entheseal modifications (EM), and bone 
shape/density/cross-sectional diameters. 


Osteoarthritis (OA) 


The need for conservative approaches to the interpretation of OA has 
been cautioned by Bridges (1992), Jurmain (1999), Jurmain et al. (2012), 
Waldron (1995, 2012, 2019), and Wallace et al. (2017b), among others. 
Tony Waldron, for example, has persistently argued, based upon clinical 
and epidemiological data, for an appreciation of the multiple genetic, 
activity-related, and individualizing features, such as age, sex, obesity, 
and joint shape that may stimulate the breakdown of joints, ensuing bony 
OA. This multifactorial etiology continues to be represented in the recent 
clinical literature. Genetic risk factors for joint breakdown have been 
identified—with weight-bearing being considered separately (Boer et al. 
2021)—in an article that also defined risks specific to females, individual 
joints, and age groups. Risk factors have been considered separately for 
different individuals and across joints. 


“Person-level risk factors with strong evidence regarding osteoar- 
thritis incidence and/or progression include age, sex, socioeconomic 
status, family history, and obesity. Joint-level risk factors with strong 
evidence for incident osteoarthritis risk include injury and occupa- 
tional joint loading; the associations of injury and joint alignment 
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with osteoarthritis progression are compelling. Moderate levels of 
physical activity have not been linked to increased osteoarthritis risk. 
Some topics of high recent interest or emerging evidence for associ- 
ation with osteoarthritis include metabolic pathways, vitamins, joint 
shape, bone density, limb length inequality, muscle strength and 
mass, and early structural damage.” (Allen et al. 2015: 276) 


Also of interest are studies from sports medicine, as we shall review 
more extensively below in reference to cross-sectional diameters. This 
literature provides case studies related to specific sports and surveys, 
such as a recent review and meta-analysis by Bestwick-Stevenson et al. 
(2021) that considered the following activities: American football, 
bobsleigh, handball, ice hockey, shooting, and wrestling. The hip, knee 
and ankle were considered, and a statistical review in relationship to con- 
trols for all sports found more OA at all three joints. Ice hockey athletes 
and wrestlers were at risk for hip and knee OA, while handball players 
developed hip arthritis. 

Turning to bioarchaeological reviews of occupational associations, 
we also emphasize that while risk factors for joint degeneration are 
higher in certain groups, there is no simple relationship: 


“There have been a great many studies of OA in modern occupational 
groups, usually with results that are entirely unsurprising; that is to 
say, miners tend to have a greater than normal prevalence of OA of 
the spine, carpet layers of the knee, ballet dancers of the foot, and so 
on. There are, in addition, some results that are not so obvious, the 
most convincing being that farmers have a much greater frequency of 
OA of the hip than the general population . . . and it has to be remem- 
bered that by no means all those who engage in hard physical work 
get OA and, conversely, that those who lead entirely sedentary lives 
may do so. 

The most important features of all this endeavor, however, are that 
there is absolutely no form of OA that is unique to one occupational 
group, and secondly, that even in those occupations in which there is 
a greatly increased risk of developed OA of a particular site, there are 
many more individuals outside that occupation with the condition 
than inside it. Thus, although there seems little doubt that farmers are 
greatly at risk of developing OA of the hip, the majority of those with 
the condition are not farmers. ” (Waldron 2012: 519-520) 


One of the most compelling arguments for the multitude of currently 
risk factors for OA appears in Wallace et. al.’s (2017b) longitudinal study 
of prehistoric and historic (pre vs. post mid-20" century, characterized as 
early industrial vs. postindustrial) skeletal remains. This study illustrates 
that age is insufficient to explain the current marked increase in knee OA 
since 1976. The authors suggest that OA may be due to modifiable fac- 
tors such as BMI and activity levels, thus joining type 2 diabetes, athero- 
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sclerosis, and hypertension as possible “evolutionary mismatches,” bio- 
logical results of human bodies being imperfectly adapted to contempo- 
rary lifestyles. 

Jurmain et al.’s (2012: 534) persistent caution that there is no support 
for assuming a simplistic relationship between habitual activity and OA, 
including both joint degeneration and bony changes, therefore, continues 
to be compelling, especially in the face of recent experimental and clini- 
cal evidence. 

This ambiguous relationship between observable bony changes and 
repetitive activity, along with the variable preservation of joint surfaces 
at Phaleron, leads us to only consider joint degeneration when it has been 
noted to be associated with a particular behavior or type of behavior we 
are modeling and in concert with other forms of bony change in individ- 
uals. We emphasize that such evidence is compelling in reference to spe- 
cific activities only when carefully contextualized (e.g., Merbs 1983). 


MSMs and Entheseal changes 


As Kennedy (1998: 305) notes, studies of activity-related stress began 
during the Middle Ages, with medical diagnoses related to military serv- 
ice and trade assuming prominence, followed by discussions of industrial 
medicine in 1700 (Kennedy 1989). Within bioarchaeology histories of 
the “musculo-skeletal markers” (MSMs) — more recently termed “enthe- 
seal changes” (ECs) — relate enthusiasm for the use of these alterations of 
areas of tendinous attachments as attributes associated with activities or 
specific occupations started during the 1960s (Henderson and Alves 
Cardoso 2013; Jurmain 1999, 2012; Kennedy 1989; Pearson and 
Buikstra 2006; Sick 2020). Influenced initially by the work of Angel 
(1966, 1971) and subsequent twentieth century scholarship (e.g., Bridges 
1989; Capasso et al. 1999; Kennedy 1989; Kennedy et al. 1986; Merbs 
1983), enthusiasm for “Bioarchaeology’s Holy Grail” (Jurmain et al. 
2012) led to a 1997 symposium at the annual meeting of the American 
Association of Physical Anthropologists (AAPA) in St. Louis and a sub- 
sequent symposium volume in the International Journal of 
Osteoarchaeology (IJO) (Peterson and Hawkey 1998). In the preface, the 
editors emphasized that much remains to be learned about MSMs: 


“A number of factors still need to be addressed in this relatively new 
approach, including an understanding of the role and rate of bone 
remodeling, the effect of hormonal differences and pathological 
agents on bone growth, and how biomechanical variables (including 
the role of individual variation in muscle attachment, muscle fiber 
arrangement, and origin/insertion type) may affect musculoskeletal 
stress markers. ” (Peterson and Hawkey 1998:303) 


The organizers also argued that the “next logical step in this field of 
enquiry is to generate predictive models against which to test the data, 
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focusing on the use of all joint complexes in an individual (Peterson and 
Hawkey 1998:303). While most of the papers in the symposium volume 
were case studies, these were linked to critiques (e.g., Stirland 1998) and 
explicit statements about how far to take interpretations. Robb (1998) 
argued for patterning differences that emerged through cluster analyses 
of Iron Age samples from Italy. Hawkey and Merbs (1995) illustrated a 
detailed case study of “care” that anticipated 21“ century excitement over 
osteobiographies and the bioarchaeology of care. Another promising 
direction was illustrated by Churchill and Morris’s (1998) use of the 
Optimal Foraging Dietary Breadth model to make predictions about mus- 
cle scar rugosity and size for muscles of the upper and lower limbs across 
ecozones for pre-contact foragers from South Africa. The authors con- 
cluded that food foraged by males differed more across the ecozones than 
that foraged by females, based upon significant differences across male 
upper limbs. That sex-based activities other than explicit food acquisition 
could be implicated does not detract from the elegance of a study 
grounded in theoretical expectations, based on detailed contextual 
knowledge. 

During the first decade of the 21“ century, methodological concerns 
mounted. Bioarchaeologists did learn a great deal more from anatomists 
about the anatomical structure of entheses (Benjamin et al. 2002, 2006; 
Benjamin and McGonagle 2009; Villotte and Kniisel 2013), and deci- 
sions about collecting data reflecting changes in the spectrum of attach- 
ment sites that ranged from fibrous to fibro-cartilaginous required atten- 
tion, with the latter being preferred due to lack of knowledge about the 
relationship between the former and activity (Villotte and Knüsel 2013). 
Concern about the unknowns affecting the unrealized potential of MSMs 
for the reconstruction of activity led to a Workshop in Musculoskeletal 
Stress Markers on July 2-3, 2009, sponsored by the Research Centre for 
Anthropology and Health at the University of Coimbra, Portugal 
(http://cias.uc.pt/workshop-musculoskeletal-stress-markers-msm/). The 
outcomes of this conference were many and productive. Given the need 
for in-depth consideration of terminology, scoring protocols, and the 
relationship between activity and the form taken by MSMS, including 
tests in skeletons of documented occupations, three working groups were 
formed on each of these topics. Papers from the Workshop were posted 
online (http://www.uc.pt/en/www.uc.pt/en/cia/msm/MSM_Occupation- 
cia/msm/msm after) and a summary was presented (Santos et al. 2011). 
A further review of progress (Jurmain et al. 2012) was followed by a 
poster symposium at the 2012 annual meeting of the AAPA in Portland, 
Oregon. Results that the symposium largely reflected progress achieved 
by the Coimbra Workshop Working Groups were published the following 
year in the IJO (http://www.uc.pt/en/cia/msm _ after) (Henderson and 
Alves Cardoso 2013). Some outcomes have been readily implemented, 
for example, using the term “entheseal changes” or “EC” to reference the 
attachment sites, rather than “MSMs” or “MOS” (markers of occupations 
stress). EC was viewed as being a neutral term for describing the changes 
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being recorded (Jurmain and Villotte 2010). While no universally 
accepted visual scoring system has become the consensus method (cf. 
Havelkova et al. 2013, Wesp 2014; 2021), the Coimbra method’s rel- 
atively high repeatability and relative ease of training (Wilczak et al. 
2016) presents decided advantages (Henderson et al. 2013a, 2016, 2017). 
Other desirable features include its assimilation of the strengths of the 
Villotte et al.’s (2010) anatomical and clinically informed approach, inte- 
grated with Marriotti’s clearly identifiable attributes, scored along speci- 
fied dimensions of variability (Mariotti et al. 2004, 2007; Milella et al. 
2012), which appears to be the most rigorous method for EC to date 
(Henderson et al. 2013a; Henderson et al. 2016). Imaging methods con- 
tinue to be explored (Karakostis and Lorenzo 2016; Karakostis et al. 
2018; Nikita et al. 2019; Nolte and Wilczak 2013). 

A persistent problem in EC studies is how best to address activity 
levels in relationship to occupation. One of the Coimbra Workshop task 
forces faced that issue directly in their studies of EC in skeletal collec- 
tions wherein occupation was documented (Perréard Lopreno et al. 
2013). They concluded that previous studies have focused primarily 
upon biomechanical (manual/nonmanual) categories—which seem rel- 
atively consistent across researchers—and sociocultural categories— 
which do not (Alves Cardoso and Henderson 2013). Alves Cardoso and 
Henderson’s (2013: 1194) advice, that “research should not blindly rely 
on occupation at death to test the relationship between EC and occupa- 
tion”, is an important outcome of these deliberations. Related, constrain- 
ing issues include the fact that many women’s occupations in these 19" 
and early 20" century circumstances are not differentiated beyond 
“housewife” (Milella et al. 2015; Villotte 2009). Age-at-death and body 
mass have also emerged as important variables associated with EC pat- 
terning in documented collections (Alves Cardoso and Henderson 2013; 
Godde et al. 2018; Jurmain et al. 2012). Other ambiguities develop from 
distinctive cultural differences in defining occupations and failure to 
record time depth for occupations (Henderson et al. 2013b). The Basel 
Spitalfriedhof Collection is an exception to the latter limitation, as dis- 
cussed below (Hotz and Steinke 2012; Karakostis et al. 2017). In general, 
those working with documented collections approach consensus in terms 
of “biomechanical” categories—e.g., manual vs. nonmanual workers 
(www.uc.pt/en/cia/msm/MSM Occupation) — or similar distinctions 
based upon muscle groups—e.g., Wesp 2014, 2021. Milella et al. (2015), 
for example, in an inductive approach using merged Italian and Portu- 
guese collections, report distinctions between three main groups, with the 
first and third providing the clearest contrasts: 1) occupations related to 
farming, 2) physically demanding occupations not related to farming, 
and 3) physically undemanding occupations. The authors argue for 
methods “not constrained by a priori assumptions in testing biocultural 
hypotheses” (Milella et al. 2015: 222). While this is excellent advice for 
exploratory efforts at Phaleron, seeking evidence relating to specific pro- 
fessions is also desirable, e.g., rowers (discussed in the closing sections). 
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An especially promising approach to exploring behavior differences 
and muscle synergies has recently been developed by Karakostis and col- 
leagues (Karakostis and Lorenzo 2016; Karakostis et al. 2017, 2018, 
2019; Karakostis and Harvati 2021) through a careful series of empirical 
tests that include natural and laboratory experiments, along with geomet- 
ric morphometrics (Karakostis et al. 2018). Validated in the Basel Spital- 
friedhof Collection (Karakostis et al. 2017), wherein lifetime occupa- 
tional information along with standard demographic information are doc- 
umented (Hotz and Steinke 2012), the procedure maps multivariate pat- 
terns from hand entheses of archaeological samples upon documented 
long term occupational patterns. This comparative approach, termed the 
V.E.R.A method, has been applied to a small sample of remains from the 
Phaleron cemetery with promising results, which suggest distinctive life 
histories for individuals buried in different interment contexts (Karakos- 
tis et al. 2021). 

Therefore, however enticing the identification of an unknown indi- 
vidual with a specific occupation via EC, this goal should be approached 
with caution, as the empirical support data are lacking. Osteobiographies 
or other studies that infer specific occupations from ECs should, there- 
fore, be considered aspirational, rather than conclusive (e.g., Angel and 
Caldwell 1984; Kennedy 1983, 1989; Kennedy et al. 1986). The more 
conservative approach—mapping—observed differences on patterning 
validated through natural and laboratory experiments, however, such as 
the V.E.R.A. appears quite promising. 


Long Bones (LB): Diaphyseal Shape, Bone Density, and Cross-sectional Diameters 


As Pearson and Buikstra (2006) emphasize, studies of bone shape and 
adaptation to environmental and ontogenetic factors can be traced back 
to the late 1800s, when European pathologists and anatomists, such as 
Virchow, focused on plasticity. A very influential product of this tradition 
was Julius Wolff’s “law”, codifying how bone responds to external 
stressors (Wolff 1892; see also Frost 1993; Martin et al. 1998; Ruff et al. 
2006). Following late 19" and earlier 20" century, focused observations 
and interpretations of platycnemia and platymeria (e.g., Matthews et al. 
1893), complex biomechanical models assumed prominence in bioar- 
chaeological studies, beginning with Ruff and Hayes (1983a, 1983b), 
Ruff et al. (1984), and Bridges (1985, 1989). Early studies described dif- 
ferences, such as a decline in femoral strength with agriculture (but see 
Bridges, 1989). Further comparisons focused on upper limb asymmetry 
and differences between the sexes. 

Studies of cross-sectional geometry have been criticized on experi- 
mental grounds, with certain loading parameters not behaving in the pre- 
dicted manner (Lieberman et al. 2004). Lovejoy et al. (2002) argued 
against simple models for explaining variation in bone shape and compo- 
sition, while Jurmain (1999, Jurmain et al. 2012) emphasized the need for 
clinical or other controlled studies that linked cross-sectional geometry to 
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specific activities. More recently, Wallace et al. (2017a: 234) have voiced 
concerns about several factors: the weak relationship between loading 
and bone structure, the mechanical inefficiency of bone’s response to 
loading, the comparatively large influence of genes on bones’ responsiv- 
ity to loading, and age-dependency of responsiveness. The last-men- 
tioned topic is considered below, as it may facilitate a window for 
addressing occupational specialization during pre-adult and early adult 
years. 

Pearson and Lieberman (2004) have reviewed experimental evidence 
on the ontogeny of bone formation, which suggests that adolescent and, 
to a lesser extent, childhood activities may play a disproportionate 
influence on adult bone size and shape. Thus, studying skeletons with a 
sensitivity to age-at-death and including older juveniles may provide 
important indications concerning apprenticeship and assumption of adult 
identities. This point has also been emphasized by Wallace et al. (2017a). 

Studies with important implications for bioarchaeological interpreta- 
tions of behavior are drawn from sports medicine. As Longman et al.’ 
(2020) review of “human sport paleobiology” underscores, studies that 
have emerged from the sports science fields hold excellent potential for 
exploring human evolutionary adaptation at the species inter-individual 
and intra-individual level. Enhanced by increasingly precise imaging 
modalities, this research has now extended over more than a quarter cen- 
tury and has addressed issues of significance to bioarchaeologists, seek- 
ing to explore the past of humankind. 

Most investigations, both in sports science and bioarchaeology, have 
focused on cortical bone, especially femoral and humeral cross-sectional 
properties (size, strength, rigidity). Clinical and experimental results 
have been used to generally address mobility patterns and activity 
symmetry in archaeological examples. Pioneering bioarchaeological 
studies include those of Bridges (1989), Ruff (1987), and Ruff et al. 
(1984). Many such investigations compared skeletal samples across per- 
ceived adaptive thresholds (e.g., hunter-gatherers vs. agriculturalists, or 
Late Eneolithic vs. Early Bronze age (Sladek et al. 2007). Especially 
satisfying are the reports that display extensive knowledge of the 
archaeological record and construct testable hypotheses. A recent, 
impressive example of temporal and gender-based expectations for Neo- 
lithic, Bronze Age, and Iron Age groups from Central Europe is found in 
Macintosh and Stock (2019) and Macintosh et al. (2014, 2017). Based on 
a carefully designed study, this research well-illustrates our gender- 
based, or perhaps gender-biased, contemporary perspective on gender 
roles in the past: Macintosh et al. (2014:1) anticipated that:“Significant 
differences in upper limb asymmetry and variability will be found 
between the Early/Middle Neolithic and Early/Middle Bronze Age 
groups, associated with greater agricultural efficiency, the expansion of 
mining and copper and bronze metallurgy, the manufacture and produc- 
tion of metal objects and other crafts, and the increased task specializa- 
tion that accompanied these changes. Given the considerable overlap of 
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bronze and iron production in Central Europe, reduced temporal change 
in humeral asymmetry between the Early/Middle Bronze Age and Iron 
Age groups is expected.” The authors (p. 1) discovered that “the intro- 
duction of the ard and plow, metallurgical innovation, task specialization, 
and socioeconomic change through 5400 years of agriculture impacted 
upper limb loading in Central European women to a greater extent than 
men.” Such results importantly suggest that rethinking our perspectives 
on ancient daily lives and women’s work will benefit from further such 
contextually sensitive and rigorous study. 

Revising our visions of age and gender-based roles in the past is also 
advisable for our interpretations of strength in ancient bones. Sage advice 
on studies of bone strength and health in the past includes Agarwal’s 
(2021) perspective on current idealized bone “norms.” She (Agarwal 
2021: 3) compellingly calls for researchers to “critically reflect on which 
measures of bone loss in the different parts of the skeleton are actually 
biologically and/or socially meaningful, and to call for greater considera- 
tion of the cumulative and fluid biocultural influences on the skeleton 
over the life course beyond sex and age.” This is exceptionally good 
counsel as we integrate new imaging modalities and the study of bone 
trabeculae increasingly into our research designs. 

Of importance to our research on occupation in the past is a suite of 
studies that demonstrate that bone mass gained during the second decade 
of life appears to persist for years. Such research has been stimulated by 
a concern for forming and maintaining bone strength into the older adult 
years, and relatively few longitudinal studies have been published. Use- 
ful examples include longitudinal investigations of young children who 
participated in high impact (jumping) exercise vs. those who stretched. 
Followed for eight years, the jumpers maintained elevated bone mineral 
content (BMC) at the hip (Gunter et al. 2008). More recent studies of 
impact loading (IL) suggest that IL prior to menarche is associated with 
postmenarche diaphysis size and strength increases in women (Murray 
and Erlandson 2021). Longitudinal study of pre-menarchal gymnasts, 
who left the sport before menarche, found that the gymnasts sustained 
higher forearm BMC, forearm areal bone mineral density (aBMD), and 
area for at least two years after menarche, compared to controls (Scer- 
pella et al. 2010). More recent data on gymnasts of 4 to 6 years of age 
indicate that changes are maintained in the distal radius, but not on the 
tibia or on the radius diaphysis. Warden et al. (2014) argue that certain 
aspects of bone strength in adult baseball players are sustained life-long. 
Summary statistics for women emphasize that 80-90% of adult bone 
mass is attained by 16 years of age, with nearly half accrued during the 
four years surrounding menarche (Troy et al. 2018). 

Studies have also suggested nuanced associations within specific 
bones and behaviors. The forces associated with overhand throwing in 
women softball players induce more than twice the dominant to non- 
dominant differences in midshaft humeral bone mass structure and esti- 
mated strength than in windmill (underhand) throwers, with all throwers 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


Inferring ancient human activities through skeletal study 


showing more of these attributes compared to controls (Bogenschutz et 
al. 2011). Best and coworkers (2017) distinguished patterns of calcaneus 
trabecular thickness between forefoot and rearfoot striking male runners, 
although the small sample and intervening variables suggest calcaneus 
trabecular thickness was most likely explained by running distance and 
years of running. These, along with experimental studies (e.g., Ju and 
Sone 2021), reaffirm that bone mass varies across exercise regimes 
through different architectural patterns. Ju and Sone’s rats were reported 
to have thicker trabecular due to jumping; while running and swimming 
increased trabecular numbers. Enthusiasm for studies of bone trabeculae 
must be tempered by a lack of empirical knowledge, explicitly linking 
variation in density and orientation to distinctive activity regimes and the 
tendency for postdepositional factors to significantly alter outcomes. 

In sum, experimental laboratory studies and longitudinal research in 
human athletes indicate that relationships between repetitive activities, 
especially those of adolescents, are found in bones of those living in 
older years. Similarly, adult athletes present evidence of increased bone 
mass and strength associated with their sport. To date, such knowledge 
has been used by bioarchaeologists to explore longitudinal differences 
between groups in mobility and symmetrical, task-specific behaviors. 
Results have also suggested that predictions about gender roles in the 
past may require revisiting and refinement, based upon historical, ethno- 
graphical, and archaeological sources. It will be important to develop 
explicit expectations about activities and occupations prior to employing 
increasingly advanced imaging modalities, as these become available. 
Among the implications of the youthful development of bone mass, 
based upon activities, is that bioarchaeological studies of activity and 
occupational specialization should begin with remains of adolescents, 
contextualized in context-specific information about training, apprentic- 
ing, and assumptions of adult roles. 


DISCUSSION 


We now return to the questions posed at the outset of this paper. 
What can we say, and how accurate can we be? 


Are we able to distinguish fractures attributable to interpersonal violence from 
those that result from nonviolent activities? 


The answer to this question is tentatively affirmative, depending upon the 
type and location of skeletal alterations. Sharp force trauma observed 
thus far in the Phaleron collection is readily attributable to inter-personal 
violence, as are the repetitive examples of blunt force trauma. Other 
cases will be interpreted on a case-by-case basis, with careful attention to 
environmental and cultural contexts. 
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What attributes accurately identify perimortem trauma, as opposed to postmor- 
tem bone alterations? 


This distinction continues to pose challenges for forensic anthropologists 
and bioarchaeologists, alike. Especially challenging in the Phaleron 
example is fracture location, as many of the possible perimortem frac- 
tures occur due to blows to the lateral aspects of the cranial vault. Most 
models for identifying perimortem trauma are based on evidence from 
the dense portions of the frontal and occipital bones, along with the parie- 
tal bones. We continue to generate and interrogate experimental studies 
and forensic casework to resolve this issue. 


Can we identify those engaged in heavy labor, compared to those in less demand- 
ing daily behaviors? 


We are not optimistic about osteoarthritis (OA) in attempting to generally 
identify general patterns of occupational stress in the people of Phaleron. 
OA may be useful as supporting evidence and in individual osteobio- 
graphies. Studies of entheseal remodeling (EM) and Long Bones (LB) — 
Diaphyseal Shape, Bone Density, and Cross-sectional Diameters — 
appear to be empirically based upon data from documented collections, 
experimental evidence, and sports medicine. These forms of bone 
response to activity stress will comprise the focus of the Phaleron 
Bioarchaeological Project. 


Can we proceed further, with at least a few occupations associated with distinc- 
tive suites of behaviors? 


This goal is vastly more challenging if we attempt to move beyond gen- 
eral discussions of heavy labor vs. less demanding occupations. Many 
EM and LB changes have been proposed to link to specific behaviors, 
(e.g., Capasso et al. 1999; Kennedy 1989; and Kennedy et al. 1986). 
Even so, establishing that a condition occurred in one or a few individ- 
uals from a defined occupation group at a specific place and time does 
not permit the association of the unknown occupation of an individual 
with the presence of that condition. Similarly, a theoretically generated 
model for associating muscle stress with EM and LB changes requires 
experiment (in laboratory or in documented collections) prior to accept- 
ance. 

In a contextually grounded study, sensu Hawkey and Merbs (1995), 
Thomas (2014) has identified a suite of ECs that are statistically more 
common in males buried with arrowheads in the Cerny culture Neolithic 
sites of western Europe than in those without this burial accompaniment. 
Such changes are “compatible with medical data on present-day archers” 
(Thomas 2014: 287). This is strong circumstantial evidence for identify- 
ing hunters within these early agricultural communities. 
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A profession reliably linked to specific changes is that of ballet, 
which is not referenced in the bioarchaeological or forensic anthropo- 
logical literature (Huwyler 2007; Prist et al. 2008). As Schneider et al. 
(1974: 628) report: 


“Among the various abnormalities, some similar to those found in 
athletes, were specific patterns of stress hypertrophy of the femora, 
tibiae, fibulae, and the first three metatarsal bones, and multiple 
stress fractures of the femoral necks and tibiae. This group of findings 
is sufficient to identify the classical ballet dancer. ” 


Schneider et al. (1974) also review previous studies reporting this 
phenomenon, all dating to the middle of the 20" century, and many 
written in Russian language. More recent sources (e.g., Prisk et al. 2008) 
also emphasize the enormous stresses on the forefoot, and that ballet 
dancers are both artists and athletes. 

While we are not anticipating ballet dancers among our Phaleron 
remains, we are inspired by Schneider et al.’s (1974) methodology that 
explicitly defines primary and secondary features common in ballet 
dancers’—including both males and females—suites of maladies. The 
researchers report that patterns of stress hypertrophy and stress fractures 
are uniquely linked to ballet dancers, while incidental findings include 
chip fractures, dislocations, osteochondritis dissecans, meniscus injury, 
mild OA in younger individuals that increased with age, calcareous peri- 
tendinitis, bunions, and calluses. 


Which skeletal attributes are the most reliable for characterizing bone altering 
behaviors, having been tested in other documented contests? 


The ability of entheseal alterations, especially the V.E.R.A. approach 
(Karakostis and Lorenzo 2016), and various bone shapes and densities to 
reflect activity is impressive. Identifying occupations thus requires a con- 
stellation of features, anchored by LB and EM, which supplemented by 
information from OA and fractures, holds the greatest potential for 
exploring occupation and activity at Phaleron. 


Are we able to identify (groups of) adolescent-young adult individuals whose 
daily lives suggest that occupations are being defined through apprenticeship at 
an early age? 


This approach holds potential and should be explored. It will be limited 
by the relatively few adolescent deaths at Phaleron, as is typical of any 
human group. The relatively large numbers of young adult males should 
be a source of relevant information, however. 
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Can we identify rowers at Phaleron? 


Informed by this review of activity-related changes in bone shape and 
pathology, we hope to engage in a multivariate approach for identifying 
the range and frequency of physically demanding behaviors and occupa- 
tions at Phaleron. We will use inductive procedures, such as cluster anal- 
yses, to identify bony alterations that seem to be uniquely associated, 
including fractures, EM, and LB. In addition, we will establish a priori 
constellations of core changes that identify certain occupations and 
establish whether these co-occur more commonly than one would expect 
by chance. 

For example, a survey of the sports medicine literature and other clin- 
ical sources suggests that there is a suite of features commonly associated 
with rowing, including spondylolysis (Rumball et al. 2005; Soler and 
Calderon 2000); other lower back maladies, including disc herniation 
(Hosea and Hannafin 2012; Karlson 2012; Rumball et al. 2005; Wilson et 
al. 2010); upper back degeneration (Wilson et al. 2010); rib stress frac- 
tures (Hosea and Hannafin 2012; Karlson 2012; McDonnell et al. 2011; 
Rumball et al. 2005; Warden et al. 2002; Wilson 2010); sacro-iliac dys- 
function (Rumball et al. 2005; Timm 1999); knee injuries, including 
patellofemoral and iliotibial band symptoms (Hosea and Hannafin 2012; 
Karlson 2012; Rumball et al. 2005); wrist conditions, such as tendinitis 
(Karlson 2012; Rumball et al. 2005); and shoulder pathology, such as sta- 
bility issues and possible dislocation in younger athletes (Karlson 2012; 
Rumball et al. 2005). 

In sum, rib stress fractures are the most commonly reported patholog- 
ical conditions with associated skeletal changes reported in rowers. Other 
rib-related maladies include costochondritis, costovertebral joint sublux- 
ation, and intercostal muscle strains (Rumball et al. 2005). The full suite 
of pathological change reported above (2.1.1.4) will establish our com- 
prehensive expectation range for identifying rowers on the basis of attrib- 
ute clusters, with rib fractures, lower and upper back pathology, and 
shoulder degeneration comprising the core for a priori study. 

The nature of the rowing equipment doubtless affects the frequencies 
of human skeletal pathologies (Karlson 2012). A comprehensive suite of 
additional equipment, training, and environmental factors affecting 
frequencies of rib fractures has been reported by Warden et al. (2002). We 
must be sensitive to these issues in assigning rowing or any other occupa- 
tion to the people of Phaleron. 


CONCLUSIONS 


In conclusion, we cautiously assert that there is significant potential for 
identifying individuals with both general and specific activities in the 
past. Most useful would seem to be approaches based on 3D images ana- 
lyzed according to a range of known occupations, as developed by the 
V.E.R.A. approach, along with the underdeveloped links between 
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advanced imaging methods and sports medicine. Fractures hold potential 
as well, especially when distinctions between accidental and interper- 
sonal, and perimortem and postmortem causes can be further established. 
Advancement in both methodologies and rigorous applications should be 
sought and eagerly embraced for all possible means for estimation of 
activities and occupations in the past. 
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ABSTRACT 


Knee osteoarthritis is commonly thought to be caused by joint tissue wear and 
tear produced by physical activity. Activities that subject knees to repetitive 
impacts characterized by high rates of loading are believed to be especially 
harmful. Here, we present an alternative hypothesis that physical activity, rather 
than necessarily being bad for knee tissues, may help prevent or attenuate knee 
osteoarthritis, including activities involving high rates of loading. We experimen- 
tally tested this hypothesis using guinea pigs as a model system. To simulate a 
physically inactive lifestyle, animals were housed for 22 weeks in small cages that 
restricted their mobility, while two other groups of animals were housed in one of 
two large rooms that promoted physical activity. One room had a stiff floor to 
engender high rates of hind limb loading, whereas the floor in the other room was 
cushioned to engender low rates of hind limb loading. After the experiment, we 
found that knee osteoarthritis degeneration was significantly greater among the 
physically inactive animals than among the physically active animals in both the 
stiff- and cushioned-floored rooms. These results support our hypothesis and 
challenge common assumptions about the effects of physical activity and impact 
loading rate on knee osteoarthritis. 


INTRODUCTION 


Knee osteoarthritis (OA) is a debilitating disease involving articular car- 
tilage degeneration coupled with changes in nearby bone and synovial 
tissue. A common perception of knee OA pathophysiology is that carti- 
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lage degeneration is caused by the accumulation of wear and tear engen- 
dered by physical activity throughout life (Brandt et al. 2009; Jurmain 
1977; Radin et al. 1972, 1991; Turner et al. 2007). Among the activities 
expected to be most harmful are those that expose knees to repetitive 
impacts characterized by high rates of loading (Brandt et al. 2009; Radin 
2004). These may include athletic activities such as long-distance run- 
ning or football (Driban et al. 2017), as well as everyday activities like 
walking on stiff ground surfaces (e.g., concrete pavement) or in stiff- 
soled shoes (Lafortune and Hennig 1992; Radin et al. 1982; Whittle 
1999). Thus, it has been suggested that to prevent knee OA or delay dis- 
ease progression, activities producing high rates of knee loading should 
generally be avoided, or at least take place on soft ground surfaces 
(Milgrom et al. 1998, 2003), and shoes with cushioned soles should be 
worn (Fernandes et al. 2013; McAlindon et al. 2014; Paterson et al. 
2014). 

Recently, we have challenged this wear and tear perception of knee 
OA by hypothesizing that physical activity, instead of being inevitably 
bad for knee health, may actually help prevent or attenuate knee OA 
degeneration (Berenbaum et al. 2018; Wallace et al. 2017, 2019, 2022), 
including activities involving high rates of loading (Holowka et al. 2021; 
Wallace et al. 2018). This hypothesis originated based primarily on clues 
from human evolution. For >95% of human evolution, all people were 
hunter-gatherers with physically active lifestyles that necessitated walk- 
ing long distances on a daily basis (Kraft et al. 2021; Marlowe 2005), and 
under certain conditions, also frequent long-distance running (Bramble 
and Lieberman 2004; Carrier 1984). In addition, for the vast majority of 
that time, our ancestors exclusively walked barefoot, which is known to 
generate ground reaction forces much more rapidly than walking in 
shoes, regardless of shoe sole stiffness (Holowka et al. 2019; Lafortune 
and Hennig 1992; Wallace et al. 2018). Yet, analyses of skeletal remains 
of ancient hunter-gatherers indicate that knee OA was much less prev- 
alent in the past than today, even after accounting for variation in lifespan 
(Wallace et al. 2017). Given that knee OA levels are higher today than in 
the past, while average physical activity levels are now much lower and 
cushion-soled shoes are ubiquitous, it is reasonable to hypothesize that 
frequent high-rate impact loading may be beneficial, rather than detri- 
mental, for knee health. 

Despite evolutionary reasons to hypothesize that physical activity 
producing high-rate impacts may be less of a risk for knee OA than com- 
monly believed, direct evidence supporting this hypothesis is currently 
limited. We therefore conducted an experiment to test potential links 
between physical activity, impact loading rate, and knee OA using guinea 
pigs as a model system. Guinea pigs are a suitable model because, like 
most humans who get knee OA, they develop the disease idiopathically, 
which makes them appropriate for testing potential inhibitors of knee OA 
degeneration (Bendele 2001). Moreover, previous studies have demon- 
strated that knee OA in guinea pigs is histopathologically similar to that 
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of humans (Bendele and Hulman 1988; Kraus et al. 2010). In our experi- 
ment, to simulate a physically inactive lifestyle (which we hereafter refer 
to as “sedentary”), growing animals were individually housed for 22 
weeks in small cages that restricted (but did not eliminate) their mobility, 
while two other groups of animals were group-housed in one of two large 
rooms that facilitated voluntary engagement in physical activity. One 
room had a stiff concrete floor to promote high rates of hind limb loading 
during locomotion, whereas the floor in the other room was covered with 
foam cushioning to promote low rates of hind limb loading. Our two 
hypotheses were that compared to sedentary animals, both groups of 
physically active animals would experience less knee OA degeneration 
throughout the experiment, and animals who were physically active on 
the stiff floor would experience no more knee OA degeneration than ani- 
mals who moved on the cushioned floor. Data from the sedentary group 
and physically active group housed in the cushioned-floored room have 
been reported previously (Wallace et al. 2022). 


MATERIALS AND METHODS 
Experimental design 


All procedures were approved by the IACUC of Harvard University. 
Male Hartley guinea pigs (n=45) were acquired from Charles River 
Laboratories (Wilmington, MA, USA) at 7 weeks of age. Animals were 
randomly divided into a sedentary group and two physically active 
groups (n=15/group). Sedentary animals were housed individually in 
small cages (width x length: 27 x 48 cm) with wood-shavings bedding. 
Physically active animals were group-housed in one of two large rooms 
(width x length: 183 x 244 cm) with different types of flooring. One 
group was housed in a room with a stiff, epoxy-coated concrete floor, and 
the other group was housed in a room with the same concrete floor cov- 
ered with foam cushion flooring material (thickness: 11 mm, Young's 
modulus: 1.6 MPa; Eco-Soft Plus tiles, Rubber Flooring Inc., Mesa, AZ, 
USA). Both rooms had a ceiling-mounted HDCVI camera to record 
physical activity (model: CSP-CVIED2-B, CCTV Security Pros, Cherry 
Hill, NJ, USA). The floors of both rooms were scrubbed and cleaned 
daily, which took approximately 30 min per room to complete, during 
which time the animals were kept in large plastic bins with wood-shav- 
ings bedding. The foam cushion flooring material was replaced at least 
every four weeks. All animals were kept on a 12:12-hr light/dark cycle, 
at an ambient temperature of approximately 25°C, with free access to 
water and food (LabDiet 5025, PMI Nutrition, St. Louis, MO, USA). At 
the age of 30 weeks, all animals were euthanized and right articulated 
knees were extracted and placed in 10% NBF for later histopathological 
analyses of knee OA degeneration. 

The use of Hartley guinea pigs aged 7 to 30 weeks is suitable for stu- 
dying the degree to which physical activity attenuates knee OA, since it is 
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Fig. 1. 

Histological section of the 
medial plateau of a guinea pig 
proximal tibia. Pink arrows are 
pointing to the locations of (A) 
the tidemark used to define 
cartilage thickness, (B) carti- 
lage degeneration, (C) the 
synovium, and (D) an osteo- 
phyte. 
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during this ontogenetic interval that idiopathic disease onset typically 
occurs, with initial histological signs present on the medial tibial plateau 
between 12 and 16 weeks of age (Bendele et al. 1989; Kraus et al. 2010). 
Importantly, however, OA degeneration in the medial femur and lateral 
knee compartment usually appears later in ontogeny and is less severe 
than that in the medial tibia (Bendele and Hulman 1988). Thus, this 
experiment was designed specifically to examine the effects of physical 
activity on knee OA onset rather than more advanced stages of the dis- 
ease, and our analyses of OA degeneration focused on just the medial 
tibia. 


Tibial histopathology 


Right knee joints were decalcified for 10 days in 10% formic acid and 
embedded in paraffin in a slightly flexed position (Kraus et al. 2010). 
Two 8-um coronal sections of the medial compartment were prepared, 
one anterior and one posterior, and then stained with toluidine blue 
(Fig. 1). Tibial histopathological assessments were performed blinded on 
the sections at a magnification of 25x using an ocular micrometer. 
Cartilage thickness (defined as depth to tidemark) was measured at the 
mediolateral midpoint of cartilage width (defined as the total mediolat- 
eral span of cartilage across the load-bearing surface of the medial tibial 
plateau). Cartilage width was divided into three equal-diameter zones 
(medial, central, and lateral) and cartilage degeneration in each zone was 
scored following an established method (Bendele et al. 1996). Scores 
were based on the evaluation of chondrocyte death/loss, matrix fibrilla- 
tion/loss, and aggrecan loss, with chondrocyte loss being the main deter- 
minant of the scores. Degeneration in each zone was scored on a scale 
from 0 to 5 (none to severe), and a 3-zone-sum for degeneration was cal- 
culated by adding the values obtained for each zone. Following an estab- 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


Effects of physical activity and impact loading rate on knee osteoarthritis 


lished method (Bendele et al. 1996), synovial inflammation was scored 
ona scale from 0 to 5 (none to severe) based on the presence or absence 
of an increased number of synovial lining cell layers and proliferation of 
the subsynovial tissue. The thickness of osteophytes, when present, was 
measured from the tidemark to the furthest point extending toward the 
synovium. Values from the anterior and posterior sections were averaged. 


Physical activity measurements 


Among animals housed in the cushioned- and stiff-floored rooms, ceil- 
ing-mounted cameras were used to quantify physical activity levels on 5 
separate days throughout the experiment. Cameras recorded at a rate of 5 
frames per second during the 12 hours of each day when the lights were 
on. Room cleaning on these days took place when the lights were off. To 
help track animal movement, red circles were painted on the backs of ani- 
mals using a non-toxic marker (Stoelting Co., Wood Dale, IL, USA). 
Before the beginning of the experiment, camera lens distortion coeffi- 
cients were calculated based on the distortion of a checkerboard pattern 
held at different angles to the cameras. Additionally, a rod of known dis- 
tance was placed on the ground at different angles to calibrate room 
dimensions in the camera recordings. For each 12-hour recording, the 
movement of each animal was automatically tracked from frame to frame 
using DLTdv software (Hedrick 2008). From these data, the average total 
distance traveled by the animals in each room during the 12-hour period 
was calculated for each of the 5 days. 


Hind limb loading rate on cushioned versus stiff surfaces 


To verify that the foam cushion flooring material had the predicted effect 
of decreasing the rate of hind limb loading, the vertical component of 
ground reaction forces generated by guinea pig locomotion was meas- 
ured using a custom-built force plate, with and without a piece of the 
foam material covering the plate surface. The plate consisted of a load 
cell (Nano 43, ATI, Apex, NC, USA) housed within a 3D-printed chassis 
and a top plate (width x length: 15 x 15 cm) made of carbon fiber 
reinforced nylon. The plate was situated at the center of a plywood track- 
way (width x length: 15 x 400 cm). To collect data, 10 guinea pigs were 
randomly selected from the physically active cohorts. During trials, each 
animal was released at one end of the trackway and moved at a self- 
selected speed down the trackway and across the plate surface. Videos 
recorded in lateral view were used to identify trials resulting in single 
hind limb contacts of the load cell during steady-state locomotion. 
Animal speed was determined from videos as the time required for a 
fixed anatomical landmark (the nose) to pass between markers on either 
side of the trackway. Ground reaction force data were collected at 4 kHz 
and imported into Igor Pro software (v7.1, WaveMetrics Inc., Lake 
Oswego, OR, USA) via an analog-to-digital converter (USB-6521, 
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National Instruments, Austin, TX, USA) and filtered with smoothing 
spline interpolation (smoothing factor<0.05). Data from 10 hind limb 
contacts were collected with the foam material covering the plate surface 
and 10 hind limb contacts without the foam material. For each trial, the 
linear rise of the ground reaction force was measured using Igor Pro soft- 
ware to calculate hind limb loading rate. Values were normalized to body 
weight to facilitate comparisons across individuals. 


Tibial strain rate on cushioned versus stiff surfaces 


To further verify that the foam cushion flooring material decreased the 
rate of hind limb loading, an additional male Hartley guinea pig (aged 10 
weeks) was purchased and used to measure tibial diaphyseal strain rates 
during locomotion on a treadmill with and without the foam material 
covering the treadmill belt surface. Under isoflurane general anesthesia, 
a single-element strain gauge (Sokki Kenkyujo, Tokyo, Japan) was 
affixed to the anterior surface of the left tibial mid-diaphysis of the ani- 
mal. At the gauge site, a small skin incision was made to gain access to 
the bone, an area (2 x 2 mm) of the periosteum was elevated, the bone 
surface was degreased with chloroform, and the gauge was glued to the 
prepared surface with cyanoacrylate. Care was taken to align the gauge 
element with the long axis of the bone. Gauge leads were passed subcu- 
taneously and emerged through a small skin incision on the animal’s 
back. Incisions were sutured closed. Strain data were recorded 12 hours 
after surgery while the animal trotted (30 m/s) on a level motorized tread- 
mill (Woodway, Waukesha, WI, USA) with a belt composed of stiff, 
rubber-coated steel slats (0.56 x 0.07 m). Voltage changes in strain 
gauges were conditioned and amplified (Vishay 2150, 
MicroMeasurements Inc., Raleigh, NC, USA), and data were acquired 
through a DAQ board (PowerLab, ADInstruments, Colorado Springs, 
CO, USA) run by LabChart software (ADInstruments). Data were 
recorded from 10 gait cycles with the animal moving on the treadmill 
with strips of the foam cushion material attached to the belt slats, and 
from 10 gait cycles without the foam cushion material. For each gait 
cycle, the strain rate was calculated using Igor Pro software. 


Statistics 


Shapiro-Wilk tests were used to determine if data followed a normal dis- 
tribution, and Levene's tests were used to assess the equality of group 
variances. Statistical evaluation of differences among animals assigned 
to the sedentary group and those housed in the cushioned- and stiff- 
floored rooms was conducted with an analysis of variance (ANOVA) fol- 
lowed by a Tukey’s honestly significant difference (HSD) multiple 
comparisons test. When the equal variances assumption was violated, a 
Games-Howell (GH) multiple comparisons test was carried out. A gen- 
eral linear mixed model (GLMM) was used to compare hind limb load- 
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ing rates (in units of body weight) during locomotion across the force 
plate with and without foam cushioning covering the plate surface, with 
animal speed included as a covariate, and animal identity and housing 
condition (cushioned- or stiff-floored room) included as random effects. 
Independent-samples t-tests were used to assess differences in tibial 
diaphyseal strain rates during locomotion on the treadmill with and with- 
out foam cushioning attached to the treadmill belt surface, as well as 
average daily movement distances between animals housed in the cush- 
ioned- versus stiff-floored rooms. Statistical analyses were conducted 
using JMP Pro software (v. 15, SAS Inst., Cary, NC, USA) and SPSS 
software (v. 20; IBM Corp., Armonk, NY, USA). Statistical significance 
was judged using a 95% criterion (P<0.05), and tests were two-tailed. 


RESULTS 
Effects of surface cushioning on hind limb biomechanics 


Guinea pig locomotion across the force plate with the foam cushion 
material attached to the plate surface engendered hind limb loading rates 
that were, on average, 37% lower than those produced during locomotion 
on a stiff, uncushioned force plate, after controlling for self-selected 
locomotor speeds (GLMM: P=0.035; Fig. 2A). Tibial diaphyseal strain 
rates generated during locomotion at a fixed speed on the treadmill with 
the foam cushion material attached to the belt surface were, on average, 
15% lower compared to locomotion at the same speed on a stiff, uncush- 
ioned treadmill belt (¢-test: P<0.01; Fig. 2B). 


Physical activity levels 


Throughout the experiment, animals housed in the rooms with cushioned 
and stiff floors both voluntarily engaged in higher levels of physical 
activity than possible among the sedentary animals housed in small, 
restrictive cages (Fig. 2C). The minimum and maximum average daily 
(12-hr) movement distances measured among animals in either room 
were 1.23 km and 6.12 km, respectively. Nevertheless, the activity levels 
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Fig. 2. 

Hind limb mechanical environ- 
ment on cushioned versus stiff 
surfaces. 

(A) LS mean hind limb loading 
rate (+s.e.) during locomotion 
across a force plate with and 
without foam cushioning cov- 
ering the plate surface, con- 
trolling for self-selected 
locomotor speeds. Hind limb 
loading rate is in units of body 
weight (bw). 

(B) Mean tibial diaphyseal 
strain rate (+s.d.) during loco- 
motion on a treadmill at a fixed 
speed with and without foam 
cushioning attached to the 
treadmill belt surface. 

(C) Mean average daily move- 
ment distance (+s.d.) among 
animals housed in the cush- 
ioned- and stiff-floored rooms. 
Lines above bars indicate sig- 
nificant group differences. 
Gray dots in (B) and (C) repre- 
sent individual data points. 
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Fig. 3. 

Body size among sedentary 
animals and physically active 
animals in the cushioned- and 
stiff-floored rooms. 

(A) Changes in mean body 
weight (+s.d.) among animals 
in the three groups during the 
experiment. 

(B) Differences in mean snout- 
rump length (+s.d.) among ani- 
mals in the three groups at the 
end of the experiment. 

(C) Differences in mean tibial 
length (+s.d.) among animals in 
the three groups at the end of 
the experiment. 

Key to symbols in (A): * Stiff 
significantly less than both 
Sedentary and Cushion; ¢ Stiff 
and Cushion both significantly 
less than Sedentary; 8 Stiff sig- 
nificantly less than Sedentary. 
Gray dots in (B) and (C) repre- 
sent individual data points, and 
lines above bars indicate sig- 
nificant group differences. 
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of animals in the two different rooms were distinct. Specifically, the aver- 
age daily movement distances of animals in the cushioned-floored room 
were 170% greater than those of animals in the stiff-floored room (t-test: 
P<0.001). 


Body size 


At the start of the experiment, average body weight was similar among 
animals assigned to the sedentary group and those housed in the cush- 
ioned- and stiff-floored rooms (ANOVA: P=0.88; Fig. 3A). Throughout 
the experiment, body weights increased at a similar rate among animals 
in the sedentary group and physically active animals in the cushioned- 
floored room, such that by the end of the experiment, average body 
weight did not differ significantly between the two groups (HSD: 
P=0.14). However, among physically active animals in the stiff-floored 
room, body weights increased markedly less during the experiment than 
among animals in the other two groups, especially after 18 weeks of age. 
By the end of the experiment, average body weight among animals in the 
stiff-floored room was 17% and 13% lower than among sedentary ani- 
mals and those in the cushioned-floored room, respectively (HSD: 
P<0.0001 for both comparisons). 

At the end of the experiment, average snout-to-rump length did not 
differ significantly between animals in the sedentary group and physi- 
cally active animals in the cushioned-floored room (HSD: P=0.81; 
Fig. 3B), nor did average tibial length (HSD: P=0.99; Fig. 3C). Among 
physically active animals in the stiff-floored room, however, average 
snout-to-rump length was 5% less than among animals in both the seden- 
tary group and those in the cushioned-floored room (HSD: P<0.0001 for 
both comparisons). The average tibial length was 3% shorter among ani- 
mals in the stiff-floored room, compared to animals in both of the other 
two groups (HSD: P=0.020 and P=0.024 for sedentary and cushioned- 
floored group comparisons, respectively). 
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Knee OA degeneration 


In the medial tibia, at the end of the experiment, sedentary and physically 
active animals in the cushioned- and stiff-floored rooms presented carti- 
lage of similar thickness (ANOVA: P=0.58; Fig. 4A). However, cartilage 
degeneration scores were significantly higher among sedentary animals 
than among physically active animals in both the cushioned- and stiff- 
floored rooms (GH: P=0.049 and P=0.029, respectively; Fig. 4B), as 
were synovial inflammation scores (GH: P=0.019 for both comparisons; 
Fig. 4C). Osteophytes were present in 53% (8/15) of sedentary animals 
and 67% (10/15) and 47% (7/15) of physically active animals in the cush- 
ioned- and stiff-floored rooms, respectively. Osteophytes among seden- 
tary animals were, on average, 92% and 121% larger than those of 
physically active animals in the cushioned- and stiff-floored rooms, 
respectively (HSD: P<0.001 for both comparisons; Fig. 4D). No signifi- 
cant differences were detected between the two physically active groups 
in cartilage degeneration scores (GH: P=0.79), synovitis scores (GH: 
P=0.99), or osteophyte size (HSD: P=0.89). 


DISCUSSION 


In this study, to assess the effects of physical activity and impact loading 
rate on knee OA degeneration, growing guinea pigs were raised for 22 
weeks in one of three groups. A group of sedentary animals was housed 
in small cages that restricted mobility, and two groups of animals were 
housed in one of two large rooms that promoted voluntary physical activ- 
ity. One room had a stiff concrete floor to engender high rates of hind 
limb loading during locomotion, and the other room had a floor covered 
with foam cushioning to engender lower rates of hind limb loading. 
Measurements of hind limb ground reaction forces and tibial bone strains 
confirmed that the cushion flooring material had the intended effect of 
generally lowering rates of hind limb loading. Our first hypothesis was 
that relative to sedentary animals, both groups of physically active ani- 
mals would undergo less knee OA degeneration during the experiment. 
The results support this hypothesis. At the end of the experiment, com- 
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pared to sedentary animals, both groups of physically active animals had 
significantly lower knee cartilage degeneration scores, lower synovial 
inflammation scores, and smaller osteophytes when present. Our second 
hypothesis was that between the two physically active groups, animals in 
the stiff-floored room would experience no more knee OA degeneration 
than animals housed in the cushioned-floored. The results also support 
this hypothesis. The two physically active groups had similar knee carti- 
lage degeneration scores, synovial inflammation scores, and sizes of 
osteophytes when present. Overall, the results of this study provide sup- 
port for the idea that physical activity, instead of being inevitably bad for 
knee health, has the potential to attenuate knee OA degeneration 
(Berenbaum et al. 2018; Griffin et al. 2012; Otterness et al. 1998; Wallace 
et al. 2017, 2019, 2022), including activities involving high rates of load- 
ing (Holowka et al. 2021; Wallace et al. 2018). 

The precise pathways by which physical activity might attenuate 
knee OA degeneration are not fully known, but there are at least two pos- 
sibilities. First, physical activity might help prevent the accumulation of 
excess body weight, a well-known risk factor for knee OA (Bendele and 
Hulman 1991; Felson et al. 1988; Wluka et al. 2013). Excess body weight 
likely affects knee OA degeneration by producing a combination of adi- 
posity-induced metaflammation and abnormal joint loading (Berenbaum 
et al. 2018; Zapata-Linares et al. 2021). Throughout most of our experi- 
ment, physically active animals in the stiff-floored room had signifi- 
cantly lower body weight than sedentary animals, which may have con- 
tributed to their lower levels of knee OA degeneration. Differences in 
body weight, however, do not clearly explain differences in knee OA 
degeneration between physically active animals in the cushioned-floor 
room and sedentary animals since body weights were generally similar 
between the two groups. Though, it is possible that sedentary animals 
still had greater adiposity and hence adiposity-induced metaflammation. 
Second, physical activity while young may help promote the growth of 
stronger knee tissues that are more resistant to OA degeneration later in 
life (Helminen et al. 2000). Specifically, previous experiments have 
shown that young animals treated with exercise develop thicker knee car- 
tilage, higher cartilage aggrecan content, and increased cartilage stiffness 
(Jurvelin et al. 1986; Kiviranta et al. 1988; Saamanen et al. 1988, 1989). 
In our study, cartilage thickness was found to be similar among animals 
in all three groups at the end of the experiment, but it is possible that there 
were important differences in knee tissue development between physi- 
cally active and sedentary animals that we did not measure. Ultimately, 
additional research is necessary to better understand the pathways that 
underlie the benefits of physical activity for attenuating knee OA degen- 
eration. 

During knee loading, cartilage and subchondral bone undergo defor- 
mation, which helps to minimize stresses within the cartilage matrix. 
Because knee cartilage and subchondral bone are viscoelastic tissues, 
less deformation is expected to occur when loads are generated more rap- 
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idly. It is for this reason that physical activities exposing knees to high 
rates of loading have been hypothesized to be especially likely to cause 
cartilage damage and OA degeneration (Brandt et al. 2009; Radin 2004). 
This hypothesis, while sensible, is not supported by our finding that the 
physically active animals in the cushioned- and stiff-floored rooms had 
similarly low levels of knee OA degeneration relative to sedentary ani- 
mals, despite animals in the stiff-floored room experiencing generally 
higher rates of hind limb loading. Also inconsistent with this hypothesis 
are the results of many previous studies showing that habitual engage- 
ment in activities producing high rates of knee loading (e.g., long-dis- 
tance running) is not a strong predictor of knee OA risk (Lo et al. 2017; 
Newton et al. 1997; Timmins et al. 2017). Although some prior animal 
experiments provide support for the idea that higher rates of loading can 
be more damaging to joint tissues, these studies involved artificial loads 
applied under non-physiologic conditions (Ewers et al. 2002; Radin et al. 
1985; Yang et al. 1989). In the only experiment that we are aware of 
besides our own that investigated the effects of higher versus lower rates 
of natural loading on knee tissues under physiologic conditions, sheep 
were forced to walk for 4 hr per day for 2.5 yr on either a stiff concrete 
surface or a compliant wood chip surface (Radin et al. 1982). At the end 
of the experiment, none of the animals in either group were found to have 
major signs of OA degeneration in any of their limb joints. These results, 
together with our own, suggest that knee cartilage and other tissues are 
adapted to withstand loads applied both rapidly and gradually, as long as 
loading is within the physiologic range. 

Two findings that were unanticipated in our study are the marked dif- 
ferences in voluntary physical activity levels and body weights between 
animals in the cushioned- and stiff-floored rooms. When we noticed 
these differences during the experiment, we suspected they might have 
been due to temperature differences between the cushioned and stiff 
floors. To assess this idea, we measured room floor temperatures using a 
laser infrared thermometer gun (Lasergrip 1080, Etekcity Corp., Ana- 
heim, CA, USA) on 5 days during the last 4 weeks of the experiment. On 
all days, ambient temperatures in both rooms were approximately 25°C, 
but the average floor temperatures in the cushioned- and stiff-floored 
rooms were 21.2°C and 17.3°C, respectively (t-test: P<0.001). The lower 
critical temperature of the guinea pig thermoneutral zone is approx- 
imately 20°C (Gordon 1986), and previous studies have shown that 
rodents raised at temperatures below their lower critical temperature can 
exhibit both decreased physical activity levels and body weights, as well 
as other traits that were ultimately found to be characteristic of animals 
raised in the stiff-floored room, including shorter limbs and body lengths 
(Chevillard et al. 1963; Robbins et al. 2018; Serrat 2014; Vaanholt et al. 
2007). Thus, it seems likely that cooler floor temperatures contributed to 
the lower physical activity levels and body weights of the animals in the 
stiff-floored room. 
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Regardless of the exact causes of differences in physical activity and 
body weight between animals in the cushioned- and stiff-floored rooms, 
it is important to consider how such differences might affect interpreta- 
tions of our results related to knee OA. Importantly, our first hypothesis, 
that physical activity has the potential to attenuate knee OA degeneration, 
is well supported by comparisons between the sedentary animals and 
physically active animals in the cushioned-floored room (Wallace et al. 
2022). Moreover, even given their lower physical activity levels and 
body weights compared to animals in the cushioned-floored room, ani- 
mals in the stiff-floored room were still more physically active than 
sedentary animals. Thus, our first hypothesis is also supported by com- 
parisons between the sedentary animals and physically active animals in 
the stiff-floored room. However, in terms of our second hypothesis, that 
higher rates of activity-induced loading do not cause more knee OA 
degeneration than lower rates of loading, we cannot rule out the possibil- 
ity that had physical activity levels and body weights been more alike 
between animals in the cushioned- and stiff-floored rooms, then levels of 
knee OA degeneration would not have been as similar between the two 
physically active cohorts. To rigorously evaluate this possibility, a more 
controlled experiment will need to be conducted in the future. Neverthe- 
less, we maintain that none of our findings related to knee OA degenera- 
tion are inconsistent with our second hypothesis, nor are any results sup- 
portive of the common view that routine physical activities that expose 
knees to high rates of loading are especially harmful to knee tissues. 

Another finding that deserves consideration is that physical activity 
decreased knee OA degeneration but did not prevent the disease alto- 
gether, either among animals in the cushioned- or stiff-floored rooms. 
Similar results were obtained in a previous study in which we investi- 
gated the effects of daily treadmill running on Hartley guinea pig knees 
and found that running reduced knee OA degeneration but did not totally 
inhibit the disease (Wallace et al. 2019). At the time of that study, we 
interpreted the finding as likely being due to the modest size of the exer- 
cise dosage that the runners received, amounting to only 3% of total time 
per day. We hypothesized that larger doses of physical activity might pre- 
vent knee OA outright. The results of the current study do not support this 
hypothesis, particularly the results from the physically active animals 
housed in the cushioned-floored room. Compared to the treadmill 
runners in our previous study, animals in the cushioned-floored room 
took, on average, roughly 4 times more steps per day, yet they still 
experienced some knee OA degeneration. In retrospect, it seems possible 
that vulnerability to knee OA is so high among Hartley guinea pigs that 
no dosage of physical activity (or any other potentially prophylactic 
action) could entirely prevent the disease (Bendele and Hulman 1988; 
Brismar et al. 2003; Hyttinen et al. 2001). Indeed, compared to other 
laboratory stocks of guinea pigs, Hartley guinea pigs have been shown to 
be much more susceptible to knee OA degeneration even when kept 
under identical environmental conditions (Huebner et al. 2002). If knee 
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OA degeneration is basically inevitable among Hartley guinea pigs, then 
it may be better for future experimental studies of the preventative poten- 
tial of physical activity to employ an alternative model system. Even so, 
regardless of why physical activity failed to totally prevent knee OA in 
our two experiments, both studies provide direct evidence that routine 
engagement in physical activity can at least attenuate knee OA degenera- 
tion. 

From an evolutionary perspective, the results of this study are impor- 
tant because they provide additional support to the idea that knee OA rep- 
resents an example of a ‘mismatch disease’ that is caused, in part, by the 
musculoskeletal system being poorly adapted to environmental factors 
that were once rare but now common, including excessively sedentary 
lifestyles (Berenbaum et al. 2018; Wallace et al. 2017). In contrast to 
laboratory guinea pigs, wild guinea pigs rarely experience knee OA 
degeneration (Rothschild 2003). Our findings suggest that this is likely 
partly because guinea pigs evolved to engage in higher levels of physical 
activity than laboratory animals are normally allowed (Zipser et al. 
2014), and probably on surfaces of variable stiffness, hence their knees 
evolved to require routine physical activity to develop and function opti- 
mally and remain healthy. Since human musculoskeletal biology evolved 
among ancient physically active hunter-gatherers, human knees also pre- 
sumably evolved to require and benefit from frequent loading, including 
high-rate knee loading like that engendered by walking long distances 
barefoot. However, even if knee OA is a mismatch disease, it would not 
cease to exist even if every person and guinea pig in the world adjusted 
their physical activity levels to more closely match those of their ancient 
ancestors. Trauma and other risk factors for knee OA have and will 
always predispose some individuals to the disease. Nevertheless, our 
results suggest that habitual engagement in physical activity may be a 
powerful strategy for attenuating knee OA degeneration, and that seden- 
tism may be a greater threat to knee health than is often assumed. 

Finally, the results of this study are relevant to anthropological 
studies of knee OA among fossil and archeological human skeletons. 
Based on the traditional view of knee OA as being caused by wear and 
tear produced by physical activity, many anthropologists have assumed 
that signs of knee OA in ancient human skeletons can be interpreted as 
evidence of a lifestyle characterized by high levels of physical activity 
(e.g., Austin 2017; Bridges 1991; Cheverko and Bartelink 2017; Jurmain 
1977, 1999; Klaus et al. 2009; Larsen 1982, 2015; Larsen et al. 1995; 
Lieverse et al. 2007, 2016). The findings of this study highlight that not 
all types of physical activity should be assumed to be associated with 
greater knee OA degeneration, including everyday activities like walking 
and running that produce most of the activity-related loads that knees 
normally experience. Therefore, determining whether or not a person’s 
skeletal remains exhibit signs of knee OA is almost certainly an inaccu- 
rate way of assessing their overall physical activity levels during life. 
Consequently, caution is required when interpreting knee OA as evidence 
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of a highly physically active lifestyle, as well as the absence of knee OA 
as evidence of a more sedentary lifestyle. In all likelihood, many of our 
ancient ancestors who developed knee OA did indeed engage in high 
levels of physical activity, but presumably so did many people who never 
developed the disease. 
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CHAPTER FOUR 


A functional framework to grasp goal-directed 
behavior: Stone knapping, a good paradigm of 
human behavior 


Blandine Bril 


Abstract 


Based on an extensive review of 25 years of experiments on stone knapping and, 
more specifically, on the percussive activity involved, we argue that a functional 
framework warrants a better understanding of goal-directed action and stone 
knapping in particular. Based on a clear differentiation between the technique 
that refers to the physical mode of action on matter, and the method that refers to 
how the technique is used and is characterized by the spatial and temporal 
organization of different flaking actions, we show the necessity to develop a 
detailed description of the flaking behavior. The emphasis put on cognition has 
obscured the complexity of flaking action and, more specifically, percussive 
action. Based on a bottom-up perspective rooted in an ecological-dynamic 
framework that takes the task constraints—i.e., the conchoidal fracture mechan- 
ics—as its starting point, we show that knapping skill is grounded in the finely 
tuned capacity to produce the right kinetic energy required for the detachment of 
the desired flake, which takes years to master. Further, we show that, due to the 
great number of degrees of freedom of the human body, the movements per- 
formed are unique to each person. These results emphasize the critical role of the 
mastery of the technique and are fundamental to understanding the acquisition 
process of knapping skills. 


PART I. DEFINING THE PROBLEM 


Acting in everyday life presupposes the capacity to perform goal- 
directed actions—i.e., the faculty to produce conclusive behavioral 
sequences that bring the actor nearer to the objective. A distinction is 
consequently called for between the intentional aspect of the action—.e., 
the goal to be achieved—and its operational aspect—i.e., the manner in 
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which the goal is achieved. In this context, the following questions are 
addressed: How to bridge the gap between the idea “I want to do a certain 
thing” and the behavior that will allow such realization? What are the 
prerequisites to succeed—i.e., what skill and “knowledge” must have 
been acquired to ultimately attain the goal? What does “understanding” 
behavior mean? 

To illustrate these issues, let's start by imagining diferent people 
engaged in everyday life activities: 


“This morning the sky is clear and I decide to go for a walk, but first 
I want to drink a glass of water. In the apartment next to mine, 
Johanna plays the violin. Not far away in the dance school, Kim 
learns the pivot-turn. In the town of Khambhat (India), Hussein is 
knapping cylindrical cornelian beads, and a few kilometers further, 
Prabin throws large pots. In East Africa about 1.7 Myr ago, our 
ancestors knapped the first handaxes.” 


Taken from among an almost infinite number of everyday life activ- 
ities, what do these examples tell us about purposeful behavior (whether 
they are performed daily or occasionally, and whether they look “quite 
common” and “simple” or more elaborated and complex)? 

When looking at a person performing any of these tasks, how do we 
explain the processes that take place as the desire/intention to reach a 
goal gives rise to the achievement of a sequence of adapted (in the case of 
success) actions? The examples above, which may or not involve a tool, 
imply a sequence of connected actions involving the body moving in 
space and time. In this context, how do we explain the production of pur- 
poseful behavior? To answer this question a clear description of the goal 
is needed to decode and interpret the behavioral sequence the actor is 
engaged in to be able to reach that goal. That is the purpose of this paper. 

Focusing on complex real-life activities such as those described ear- 
lier, and more specifically on stone knapping, this paper sets out to show 
how a functional definition of the task to be achieved offers a comprehen- 
sive understanding of behavior and, consequently, an in-depth compre- 
hension of the learning process. 


The challenge of task definition 


When I decide to go for a walk, to take a glass of water, to play the violin, 
to perform a pivot turn, or to knap a cornelian bead or a hand-axe, I will, 
of course, achieve a sequence of organized body movements. But is that 
my goal? Obviously not. In the case of walking, my goal is to go from 
one place to another—i.e., moving my body ahead by means of a succes- 
sion of steps that alternate single and double support phases. When I take 
a glass of water, I want to move the glass to my mouth/lips so that I can 
drink. Similarly, when playing the violin, Johanna’s purpose is to pro- 
duce a fine melody through the vibration of the strings. The same applies 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


A functional framework to grasp goal-directed behavior: Stone knapping 


to Kim when dancing a tour pivot: although the precisely coded move- 
ment may be regarded as the focus of the dance, the aim/purpose is to 
make the body turn along the vertical axis. Finally, the goal of the potter 
is to convert a lump of clay into a specific shape, and in a similar way, 
the knapper’s goal is to transform a piece of rock by taking off a succes- 
sion of flakes. However, these are all just descriptions of behavior; they 
do not explain how and why the actor succeeds in reaching the goal. 

What is common to all these examples is that, whatever the task to be 
achieved, what causes the fulfillment of the goal is the production of 
forces. Each of these tasks involves the movement of different elements 
in play, including the actor’s body. Thus, at the level of the task and, by 
definition, the setting into motion of these different elements requires the 
production of forces specific to the task to be performed. It is because the 
walker produces feedforward forces that the body moves ahead. To suc- 
ceed in bringing the glass toward their mouth, the drinker needs to apply 
a friction force on the glass of water simultaneous to a tension force for 
moving up the glass. To produce a sound, the violinist needs to cause the 
strings to vibrate through the movements of the bow; this sequence of 
movements is characterized by a cycle of stick and slip that involves both 
static (sticking) and kinetic friction (sliding) forces as the bow and the 
string come into contact with each other (Rasamimanana 2008). A pivot 
turn is characterized by the production of an angular momentum, result- 
ing in the production of a turning motion of the body around the vertical 
axis (Laws 1979; Shim 2016). When considering wheel-thrown pottery, 
the transformation of a lump of clay into a pot depends on the combined 
actions of manual pressure and the kinetic energy of rotation (Gandon et 
al. 2016). Finally, whether it is the Indian craftsman knapping a cornelian 
bead or the prehistoric (or modern) knapper knapping a hand-axe, the 
bead or the axe will take shape thanks to a succession of percussions 
(elastic blows) that depend on the production of a given amount of 
kinetic energy at contact (between the hammer and the stone) (Bril et al. 
2010). 

Although different from each other, these examples have one thing in 
common: To succeed, the actor must produce forces that are precisely 
suited to the task although the immediate circumstances may impact the 
actual behavior of the actor. Of course, the actor will not express their 
behavior in terms of production of forces, and it is indisputable that the 
whole body sustains the production of these forces thanks to its overall 
movements. In this context, what are the functional parameters the actor 
has to regulate to produce and control these forces by means of body 
movements? If we go back to the examples above, to walk from one 
place to another, the walker must produce feedforward forces, which are 
created owing to the continuous alternation and control in the increase 
and decrease of the distance/span between the center of pressure and the 
center of mass (Breniére et al. 1987; Bril et al. 2015). In the case of the 
violinist, the essential principles relating to the production of sound can 
be summarized by the relationship between speed and pressure of the 
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bow on the string (Rasamimanana 2008); below or beyond certain ratios 
between these two variables, the string cannot be set in vibration. To 
adjust the centrifugal forces the potter has to control the throwing speed 
of the wheel through movements that will depend on the type of wheel 
(electric wheel, fly-wheel, or kick-wheel) (Gandon et al. 2013). To initi- 
ate a pivot turn, the dancer must exert a torque on the floor resulting in 
the production of a turning motion of the body; the speed of the turn 
depends in part on both the amount of friction with the ground, which is 
controlled through the vertical impulse prior to the rotation, and the vari- 
ation of inertia of the body that increases or reduces the angular momen- 
tum (Dietrich 2016; Laws 1979; Shim 2016). Let us now consider the 
percussive action required to produce a flake according to the conchoidal 
fracture: The fracture develops only if the blow is energetic enough and 
the amount of energy required is contingent upon the characteristics of 
the desired flake (Bril et al. 2010, 2012; Nonaka et al. 2010). 

The intention of this long introduction on real-life examples of pur- 
poseful actions is to point out the difficulty in describing goal-directed 
actions in order to understand what (motor) skills mean. We will address 
later on through an in-depth analysis of stone knapping that regardless of 
the activity, different body movements allow the achievement of the 
same goal. Hence, following N. Bernstein (Bernstein 1967, 1996), we 
consider that it is not the movement per se that is the focus of the actor, 
but the assessment of how to satisfy the functional constraints of the task 
(Newell 1986; Bril et al. 2015). This critical point will be expanded in the 
second part of this paper. 


What framework can we use to decode and explain goal-directed 
behavior? 


To explain the transformation of an intention into a concrete episode of 
instrumental actions directed toward a goal, it is common among cogni- 
tive scientists to attribute behavior to an intelligent “executive function 
module” or executive control mechanisms (Doebel 2020; Pargeter et al. 
2019; Wynn and Coolidge 2017). This theoretical framework ascribes the 
emergence of a coherent and well-ordered succession of actions, postures 
and movements to (1) some sort of mental representations, motor sche- 
mas, motor programs, motor procedures (Coolidge and Wynn 2005; 
Marchand 2010; Pacherie 2018; Pastra and Aloimonos 2012; Pelegrin 
2005; Wynn and Coolidge 2004; among others), and to (2) action plans, 
motor planning, and prospective planning (Pacherie 2018; Pelegrin 2005; 
Putt et al. 2017; Stout and Chaminade 2012; Uomini and Meyer 2013). 
This prescriptive approach stipulates that the observed behavior is the 
result of internal models in which a principal role is attributed to a control 
process that is implemented in the brain. Intention triggers are supposed 
to activate the appropriate motor representation and planning depending 
on the situation, guiding and controlling the execution of the sequences 
of movements. 
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For a few decades, the development of neurosciences has enforced 
the belief that studying the brain will provide the keys for understanding 
behavior. Hence, a large number of studies focus on brain activity, con- 
sidering that a better understanding of how the brain works can better 
explain behavior. Tool use behavior, for example, is associated with the 
existence of specifically assigned regions in the brain’s left hemisphere 
(Johnson-Frey 2004; Johnson-Frey et al. 2005; Orban and Caruana 2014; 
Ramayya et al. 2010; Stout and Khreisheh 2015; van Elk et al. 2014; 
among others). Within this neurocognitive framework, it is explicit that a 
causal link prevails between brain and behavior, as in Johnson-Frey’s 
unambiguous statement: “behaviors associated with complex tool use 
arise from functionally specialized networks (in the brain) involving tem- 
poral, parietal and frontal areas within the left cerebral hemisphere” 
(2004: 71). 

However, back in real-life, the puzzle stands: How to bridge the gap 
between internal representations—i.e., brain activity—and overt beha- 
vior, a ‘miracle’, to allude use Kunde’s terms (2001) although this 
miracle is often taken for granted. But how can an “abstract representa- 
tion”—regardless of its nature—be translated into the actual motor beha- 
vior? Furthermore, how can the activation of a specialized network in the 
brain give rise to a sequence of organized and efficient actions? What 
could be the exact role and function of this network regarding the for-real 
and tangible behaviors? 

In the different examples above, what would constitute an internal 
model? A motor command or a motor program? What would be their 
functions? How would they be “selected”? Would they be essential to the 
performance of an action, and the guiding and control of its execution? A 
few years ago, J. J. Summers and J. G. Anson (2009) published an in- 
depth review about the notion of “motor program” and concluded that, 
although the concept is controversial, no one really knows what it is, non- 
etheless, everyone still uses it. In their detailed discussion, the authors 
showed that there is no consensus on what a program is, what a motor 
representation is, what they contain and how and where they were 
created. Consequently, it is legitimate to ask whether this notion should 
not be seen as merely metaphorical with no great explanatory power. 

Indeed, the implicit assumptions behind the neuroscience model is 
that a detailed description of how the brain works will bring an answer to 
how behavior is generated. Correspondingly, this approach requires 
breaking the brain into the smallest cause-effect components as a neces- 
sary condition to understand a larger framework of the brain. Now, is it 
possible to understand the behavior of a system from its lower-level 
properties by only looking at lower-level properties? In a recent paper 
Krakauer and colleagues (2017: 481) provide an extremely clear answer: 
“Relaying solely on a collection of neural data, with behavior incorpo- 
rated as an afterthought (and typically over-constrained) will not lead 
meaningful answers”. If the goal of neuroscience is to explain behavior, 
and not only to understand how the brain works, these authors consider 
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that “the neural basis of behavior cannot be properly characterized with- 
out first allowing for independent detailed study of the behavior itself”! 
(Krakauer et al. 2017: 488). In other words, to understand the relation- 
ship between behavior A (the brain) and behavior B (actual behavior of a 
person or animal), it is absolutely necessary to study both with equal 
details. But the literature shows that while the development of neuro- 
science is skyrocketing, behavioral research still lacks in-depth develop- 
ment (Krakauer et al. 2017). 

The present paper is framed in light of these observations. Our aim is 
to provide a theoretical framework for describing goal-directed behavior 
in order to better understand what is meant by expertise and learning. 
Indeed, too often action is equated with movement; movements being 
considered as the building blocks of action. We consider here that it is 
important not to confuse these two notions. Based on N. Bernstein 
research on the physiology of movement (1967, 1996), we consider that 
functional actions are primary, while control of movements and postures 
are secondary. Movements are not the building blocks of action; instead, 
the control of movements is one of the results of the development of 
action, (Bernstein 1996; Reed and Bril 1996). Indeed, Bernstein states 
the following: the control that guides the movement “cares only about 
how the movement fits the external, alien space outside the body. It does 
not care much about the biomechanical side of the movement, how joint 
angles will change [...]. It knows one thing: there is enough degrees of 
freedom in an arm to place the wrist into any point of accessible space 
and by many paths. It is none of its business how joint angles actually 
group to reach the goal” (1996: 138). The end of a movement is not the 
movement per se, but the goal it allows to reach; movements exist to 
serve the purpose of the task. The crux of motor skill is not to learn to 
move the body, but to solve motor problems (Bernstein 1996: 146, 181). 
The detailed analysis of stone knapping developed in the next part of this 
paper will emphasize this theoretical point of view. 


A functional perspective rooted in an Ecological-dynamics framework 


Adaptive behavior entails continuous interaction between the organism 
and the environment (Chiel and Beer 1997; Heft 2001; Jarvilehto 1998, 
2006; Reed 1996). The ecological-dynamical framework—the fusion of 
ecological psychology and the dynamical system theory approach to the 
study of human behavior (Renshaw and Davids 2014)—emphasizes this 
point. From this perspective, understanding behavior cannot be reduced 
to either cognitive or biomechanical capacities of the organism alone. 
Adaptive behavior is an expression of the functional coupling between 
the organism (as a whole) and the environment. Such a perspective is 


1  Ttalics are mine. 
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based on the idea that the optimal unit of observation is that of one sys- 
tem: the organism/environment system (Gibson 1979; Mace 1977; Reed 
1996; Smitsman 1997; St Amant and Horton 2008). The behavior under- 
lying the fulfilment of a goal-directed action is then best viewed as emer- 
gent from the state of all the elements involved, and depends on the 
history of each of these elements over a mechanism of self-organization 
(Higgins 1985; Jarvilehto 2006; Newell 1996; Reed 1988; Thelen and 
Smith 1996). 

Referring to the constraints theory of K. Newell (Bingham 1988; 
Newell 1986, 1996), we consider that the system under study (organ- 
ism/environment) is grounded in three sources of constraints which com- 
bine to provide the boundary conditions for carrying out an action: the 
organism, the task at hand, and the environment (Newell 1996). The 
organism encompasses the physiological, biomechanical, neurological, 
cognitive, and even affective aspects of the organism, whether it is an 
animal or a human. The task characteristics refer to its functional proper- 
ties—i.e., what the actor has to do to successfully attain the goal. This 
point will be discussed more thoroughly in the second part of this paper 
through the specific case of stone knapping behavior. The environment 
comprises the universal constraints experienced by the organism (such as 
gravity or temperature), and more local characteristics (such as the avail- 
ability of tools). Regardless of the domain of behavior being studied, 
action is regarded as an emergent property of the interaction between 
these three sets of constraints depending on ongoing conditions. Ongoing 
internal conditions refer to the actual state of the organism—.e., tired- 
ness or the need to continue the activity over a long period of time. Exter- 
nal factors refer to cultural or institutional constraints (Bril 2018). This 
being said, any study about action control cannot be grasped if these dif- 
ferent kinds of constraints, and the dynamics of their interaction, are not 
at the heart of the analysis (Warren 2006). 


How to deal with a sequence of purposeful actions in goal-directed beha- 
vior? The different levels of goal-directed action 


When engaged in a task, we have seen the necessity to differentiate the 
goal—i.e., what to do—from the means—i.e., how to do it. To account 
for the behavioral course of actions, it is usual to breakdown the temporal 
sequence into smaller segments or units (Buchsbaum et al. 2015; Endress 
and Wood 2011). To address the dynamics of the activity, three concepts 
initially developed in anthropology and archaeology appear suitable to 
fully describe and analyze the course of actions: the chaine opératoire, 
the technique, and the method. 

The chaine opératoire originated in Leroi-Gourhan’s work on 
“material culture” (Leroi-Gourhan 1964). It provides a framework for a 
systematic description of the processes involved in a technical activity. It 
has been applied to a broad spectrum of craft past and present contexts. It 
aims at describing the succession of phases involved in a technical pro- 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


79 


80 


Bril 


cess, mentioning actors and material, as well as the environmental and 
the social context in which the process takes place (Cresswell 2010; 
Lemonnier 1986). 

The technique refers to the physical modalities according to which 
the action is performed. In the case of lithic production, the technique 
refers to the production of a conchoidal fracture that is initiated at a point, 
which depends on the momentum delivered at the time of contact (Bril et 
al. 2009, 2012; Dibble and Pelcin 1995; Pelegrin 2005). This applies irre- 
spective of the knapping technique considered. Hence, the technique cor- 
responds to the minimal unit of functional action on the environment and 
refers to the physical mode of action. It cannot be split into smaller func- 
tional segments. 

Finally, to reach the goal, a sequence of interrelated actions must be 
carried out, involving one or more techniques; this sequence of actions to 
produce the desired outcome is regarded as the method (Inizan et al. 
1999). In other words, the method refers to how the technique(s) is/are 
operated to reach the goal. In fine, the course of actions when reaching a 
goal can be described as the actualization of one or several techniques 
depending on the method considered. As such, techniques and methods, 
when operationalized, generate a potentially very large range of effective 
behaviors. This point will be developed in more detail in the second part. 

This perspective, which emphasizes the nature of action while focus- 
ing on the goal, assumes the technique to be the crux of any goal-directed 
action; the method, conveying its place in the whole process toward 
reaching the goal. In other words, the method acts as a guide and is 
regarded as the knowledge necessary to go through the different needed 
subgoals. 

Based on the case of stone knapping, the second part of this paper 
presents a full instance of how to implement and practice this approach, 
and debates its relevance for understanding expert behavior and, ulti- 
mately, learning. 


PART II. FROM “GOAL” TO “ACTIONS” TO “MOVEMENTS”: THE CASE 
OF STONE KNAPPING 


To illustrate the framework proposed here in order to better understand 
goal-directed behavior, the following sections present a general view of 
the results of two series of experiments completed over a period of more 
than 25 years. A first series of field experiments were initiated almost 
three decades ago in India and focused on knapping skills and learning in 
different groups of craftsmen. The whole knapping process was recorded 
and analyzed. At the time they were conducted, these experiments were 
considered groundbreaking, as instrumented hammers as well as sensors 
to record the movement of the hammer and the knappers’ upper limbs 
were used. The results suggested that knapping skills are grounded in a 
full mastery of the technique that takes years to acquire (Biryukova and 
Bril 2008; Bril et al. 2005, 2012; Roux et al. 1995). These results 
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launched a second set of experiments focusing more extensively on the 
percussive action itself: the technique. 

Although the two knapping techniques under study (indirect percus- 
sion by counter blow with soft hammer and direct percussion with hard 
hammer) are very different, we considered well founded the comparison 
between the two to understand knapping behavior. Indeed, regardless of 
the technique, what must be controlled are the parameters of the conchoi- 
dal fracture, which are, in both cases, the same. 


Cornelian bead knapping by Indian craftsmen 


One of the few places’ in the world where stone knapping is still prac- 
ticed is in Khambhat, India, within the bead industry? (Roux 2000; Bril 
et al. 2005); the technique used to make beads of different shapes being 
an indirect percussion by counterblow (Pelegrin 1994, 2005; see Fig. 1). 

The original aim of the work was to understand why it takes 10 years 
to become a good knapper. More precisely, our ambition was to ten- 
tatively disentangle the different dimensions of expertise—i.e., the 
underlying abilities, the controlled factors, and the skills and knowledge. 
With this aim in mind, we set up “field experimentations” that consisted 
of (1) working with craftsmen in a situation as close as possible to their 
everyday activity, and (2) using recording devices as similar as possible 
to those in a laboratory setting. Field experiments provide the opportu- 
nity to work on real life behaviors with maximum control. They allow for 
analysis of parameters usually studied in laboratory experimental situ- 
ations.* 

Craftsmen of different levels of expertise participated in the experi- 
ments. With bead knapping being a two-step process, the experiments 
focused on each of these stages—i.e., from the raw material to the rough- 
out and from the roughout to the preform. The objective of the initial 
experiments was to work on the knowledge of the method in relation to 
the features of the end product—.e., roughout or preform—as well as the 
characteristics of the stroke (the percussion)—1.e., the mastering of the 
technique. With regard to the first stage of knapping a bead—.e., from 
cobble to roughout— craftsmen of different levels of expertise had to 
produce two types of roughouts from selected pebbles. The entire man- 
ufacturing process of each bead was videorecorded (Roux and David 
2005). A detailed analysis of the course of actions, namely, the succes- 


n 


For modern humans practicing stone knapping, see: Stout D. 2002. Skill and 
cognition in stone tool production: an ethnographic case study from Irian Jaya. 
Current Anthropology 45(3): 693-722. 

Bead knapping is a male-only activity. Young women and children work on 
small leftover pieces from knappers to make small irregular ones by splitting 
them up. 

The results of these experiments are shown in the following papers: Biryukova 
and Bril 2008; Biryukova et al. 2015; Bril et al. 2005, 2010; Nonaka and Bril 
2012, 2014; Roux 2000; Roux et al. 1995. 
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Fig. 1. 

Illustration of the relationship 
between technique and 
method in the making of an 
ellipsoidal shaped cornelian 
bead (Khambhat, India) 
(adapted with permission from 
figures 4.1, 4.2 and 4.3 of Bril et 
al. 2005). 
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The TECHNIQUE refers to the 
physical mode of action: 


In Khambhat the flaking technique is 
an “Indirect percussion by counter 
blow“ with a soft hammer and entails i 
a striking action 1 f \2 


Ellipsoid bead 


Roughout Preform 


4 


1. Calibration of the crest 
final shaping of the crest by 
transverse removal 


The METHOD refers to: : 

2. End preparation 
preparation of micro-platforms 
or axial removing 


e How the TECHNIQUE is used 
to produce a product of a 


specific shape 3. Crest fluting 


e The spatial and temporal 
organisation of different 
flaking actions 


4. Axial removing 


5. Reduction of the 


residual crest 
very short transversal flakes 


6. End finishing 


short axial removals 


sion of actions carried out with regard to the goal to be reached, was per- 
formed’. A second set of experiments dealt with the relationship between 
the mastery of the technique and that of the method, allowing a simulta- 
neous analysis of the course of actions (the method), the percussive 
movements (the technique), and the resulting product (the bead). Crafts- 
men were asked to knap different preforms from roughouts of different 
shapes and different raw materials (cornelian and glass), with different 
hammers. The knapping sequences were also videorecorded: the hammer 


> To this end the sequences of actions were described and coded owing to com- 


puter software used in ergonomics for behavioral time series analysis (for more 
details, see Roux and David 2005, Bril et al. 2000, 2005). 
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head movement was recorded with an accelerometer, while the arm 
movement was recorded with electromagnetic devices (Biryukova and 
Bril 2008, 2016; Bril et al. 2000, 2005; Roux et al. 1995). 

In both sets of experiments, the analysis of the course of actions very 
seldom disclosed any errors in the sequencing of the subgoals, regardless 
of the level of expertise of the craftsmen, which means that they all 
“knew” the methods. On the other hand, a look at the pieces immediately 
revealed their difference in skills (Bril et al. 2005; Roux and David 2005; 
Roux et al. 1995). All the craftsmen could produce a coherent “plan of 
actions”—1.e., they had a good awareness of the method (Roux and 
David 2005; Roux et al. 1995). Variance emerged at the level of the 
removing of flakes—.e., at the level of the technique. For almost all the 
craftsmen, the methods appeared as guidelines for acting. However, 
depending on the skill level, flaking mistakes, although rare among 
experts, were dealt with differently. After a failed removal, an expert 
would produce a rejuvenation operation, whereas a less expert craftsman 
would keep repeating the same flaw; the former would, thus, positively 
modify the situation, contrary to the latter. 

In other words, the results obtained showed that the methods corre- 
spond to memorized master plans, which constitute a guide toward the 
goal. While pre-existing the action, the knowledge of the method is in no 
way sufficient for the craftsman to act effectively. It is possible to know 
the method and be unable to implement it if the percussive action (tech- 
nique) is not well mastered. 

The other important result of these experiments concerns the percus- 
sive action itself and consequently the concomitant hammer and body 
movements. The results showed that, irrespective of the level of expert- 
ise, important differences in postural preferences and movement profiles 
could be observed both on intraindividual and interindividual bases 
(Biryukova and Bril 2008; Biryukova et al. 2005; Bril et al. 2005; Roux 
et al. 1995). These results are of particular significance in explaining 
expertise and, subsequently, learning. To address this issue in more 
detail, new sets of experiments focusing on the percussive action (the 
technique) were completed with experimental archaeologists of different 
levels of expertise. In this new set of experiments, the technique under 
study was direct percussion with a hard hammer. 


The technique: the key to expertise 


As we have seen, a percussive task involves delivering a blow or a series 
of blows over an object with another object, typically held in the hand. 
This definition may be applied to any percussive activities. However, as 
already mentioned, this is only a description of the behavior that gives no 
explanation of why it is efficient. When working on goal-directed 
actions, the aim is to better understand the relationship of the three 
dimensions: What is the purpose of the action? What must be done? How 
must it be done? (Bernstein 1996: 234). Consequently, it can be notewor- 
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- Velocity at point of contact 
- Mass of hammer 


striking surface) 


e Exterior angle 


The WHAT refers to the 
functional properties of the 
task that have to be satisfied 
through the actor behavior 


FUNCTIONAL DEMAND OF THE TASK 


e Kinetic energy at contact (⁄2mv?) 


e Position of the point of impact 


e Hammer head position and orientation 
at contact (direction of the stroke relative to the 


The HOW refers to the actual 
motor activity that regulates 
the movement of the hammer 
to produce the righ velocity at 
contact 


BODY MOVEMENT 
e Arm joint angles 


e Bimanual coordination 


Fig. 2. 

Illustration of the reciprocal 
relation between the WHAT 
and the HOW: (1) Hammer 
movement that generates the 
fracture (WHAT), that is, how 
the functional demand of the 
task is satisfied, (2) Kinematics 
of the arm, which engender the 
hammer strike movement 
(HOW) (adapted with permis- 
sion from a slide shown by 

E. Biryukova, at the EHESS, 

in October 2011). 
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thy to differentiate the WHAT of the action from the HOW (see Fig. 2). 
The WHAT relates to the functional properties of the task that have to be 
satisfied through the actor’s behavior; more precisely, it refers to the rela- 
tionship between the hammer and the stone. The HOW refers to the 
actual motor activity of the knapper that regulates the movement of the 
hammer to produce the right velocity at contact; in other words, the 
HOW looks at the body and arm kinematics in terms of posture, joint 
angles variations, and bimanual coordination. 

A set of experiments (Nonaka et al 2010; Rein et al. 2013, 2014; Parry 
et al. 2014) was designed to better understand the relationship between 
the WHAT and the HOW. In other words, the aim of these experiments 
was to understand the relationship between the three dimensions of a 
goal-directed action at the level of the technique: (1) the purpose (to take 
off a flake characterized by specific shape and dimensions), (2) the 
WHAT (to produce the right kinetic energy at contact) and (3) the HOW 
(to perform a multijointed arm movement that will move the head of the 
hammer in such a manner that it will produce the right amount of kinetic 
energy at contact). 


Being an expert means fine-tuning of the functional parameters 
Based on the assumption that the expert is the one who is able to adap- 
tively succeed regardless of the constraints on the knapping task (Bril et 


al. 2000, 2005; Roux and David 2005; Roux et al. 1995), different experi- 
ments explored the knappers’ flake production and movements per- 
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formed in both free and constrained situations. Across experiments, in 
which task constraints varied (mass of the hammer and size of the flake 
to be removed), the results showed that, while all participants modified 
their behavior, the success rate was systematically higher for experts, 
with novices being hardly able to produce flakes of significantly different 
sizes when instructed to (Bril et al. 2010). When the size of the flakes to 
be produced was not imposed, experts produced flakes systematically 
larger than those produced by participants with less experience (Bril et al. 
2010; Nonaka et al. 2010). When using hammers of different masses, 
only experts were able to hold constant kinetic energy for a given flake 
size, a condition that enforces the adjustment of the velocity vector to the 
demand of the task (Biryukova and Bril 2008; Bril et al. 2010). 
Moreover, experts used consistently smaller kinetic energy, regardless of 
the conditions (producing large or small flakes, using light or heavy 
hammers). Since the kinetic energy produced by novices and intermedi- 
ate knappers was systematically greater than that which was needed (up 
to four times or even more for novices), this suggests that only experts 
were aware of the existence of a threshold mechanism underlying con- 
choidal fracturing (Bril et al. 2010). These results clearly suggest that the 
degree of attunement of kinetic energy to the task demand can be seen as 
a direct indicator of the knappers’ skill level. 

These results confirm that when knappers were queried about the 
flake they intended to produce’, only high-level experts were able to pro- 
duce a flake close to what they predicted in terms of dimensions, length, 
width, and position of the point of percussion. In addition—and this is 
critical—only experts generated a value of kinetic energy correlated with 
the dimensions of the predicted flake (Nonaka et al. 2010). 

These results indicate that one reason why the outcome of the flaking 
process does not always meet the desired goal is due to the inability to 
produce a succession of flaking that meets the requirements of the pre- 
dicted shape. 


From action to movement 


We have seen that in knapping, the velocity of the hammer has to be con- 
trolled to produce the required kinetic energy in relation to the mass of 
the hammer. As the functional parameters are imposed by the knapping 
task, once the hammer mass has been chosen, its velocity becomes the 
ultimate functional parameter to be controlled. However, it may be regu- 
lated through various strategies since, for a biological system, the effi- 
ciency of a blow can be defined in terms of potential and kinetic energy. 
The actor may rely on a large potential energy that corresponds to a wide 


€  Knappers were first instructed to draw on the core with a marker the outline of 


the flake they intended to detach, and then to detach the flake as predicted 
through direct percussion with a hammerstone. 
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range of the vertical component of the trajectory of the hammer. On the 
other hand, to reach the same velocity, a small amplitude of the trajectory 
will require additional muscular energy. Indeed, due to the great number 
of degrees of freedom of the human body, there is an infinite number of 
ways to produce a given value of kinetic energy. Hence a greater flexibil- 
ity can be achieved by concurrently changing the trajectory of the 
hammer, the amplitude of the movement and the muscular force. This 
explains why large variations in movement are observed both within and 
between individuals. 

The observed variability of strategy in body movements is sustained 
by the multiple degrees of freedom of the body at the level of joints and 
muscles (Bernstein 1996; Biryukova and Bril 2002; Latash 2012). 
Indeed, important differences are observed in postural preference and 
movement profiles both on intraindividual and interindividual bases 
(Biryukova and Bril 2008; Biryukova et al. 2015; Parry et al. 2014; Rein 
et al. 2014). For example, when examining a strike (a percussive move- 
ment), the initial and final arm positions, as well as the range of motion of 
joint angles of the wrist and the elbow, vary from one individual to 
another, again, irrespective of the level of expertise. These results support 
the hypothesis of individual motor solution to a common motor problem 
(Rein et al. 2014); each individual builds up favored personal motor pat- 
terns depending on the individual experience and anatomical configura- 
tion. 

If no clear-cut difference of movement kinematics explains the recog- 
nized level of expertise, what characterizes skilled movement in stone 
knapping? What element in the knapper’s movement makes the differ- 
ence? An analysis of the relationship between the hammer movement and 
the kinematic chain of the arm makes it possible to study the arm move- 
ment strategy (dynamics of the joints movements) in relation to the 
movement of the hammer (functional parameters) (see Scholz and 
Shöner 1999)’. It is then possible to differentiate in the distinctive joint 
movements those that detrimentally affect the movement of the hammer 
(task performance)—.e., the components that alter the performance, the 
“bad” variations—from the joint movements that have no influence on 
the hammer movement—1.e., those that do not affect the functional tra- 
jectory of the hammer, the “good” variations. In other words, this kind of 
computation studies the kinematic chain (of the arm) in terms of the con- 
tribution of the joints that produce, or not, a deviation of stroke execution 
from functionality. This method has been applied to both knapping tech- 
niques (indirect percussion by counterblow and direct percussion) and 
shows that for both cases, regardless of the skill level, the fluctuation of 
the arm joint configuration that leaves the performance constant—1.e., 
that does not affect the position of the hammer—is greater than the fluc- 


7 Uncontrolled manifold (UCM) analysis (see Scholz and Schöner 1999). 
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tuations that affect the position of the hammer. Furthermore, for both 
techniques, expert knappers display significantly smaller variability 
(good and bad) than less skilled knappers (Biryukova et al. 2015; Rein et 
al. 2013). 

Thus, while movement kinematics of the arm appears to be specific to 
each individual knapper, optimizing the trajectory of the hammer appears 
to be an important performance variable in stone knapping skill. 


Stone knapping, a bimanual activity 


If percussive movement per se is a unimanual movement, the bimanual 
nature of flaking is part of its complexity. Indeed, the control of the per- 
cussive hand is unmistakably rooted in that of a bimanual system—i.e., 
each hand is engaged in qualitatively differentiated roles—while cooper- 
ating with each other to achieve an overall goal (Nonaka and Bril 2012). 
Addressing this question is important as the non-striking hand’s role— 
the supporting hand—is not only used to stabilize the core, but also to 
provide the appropriate relative orientation of the core so that the striking 
location and angle of strike can be reached. An analysis of bimanual 
activity has been performed for the technique of percussion by counter 
blow in the production of carnelian beads, based on recurrent methods of 
analysis’, with craftsmen of different levels making beads out of two dif- 
ferent raw materials (carnelian and glass) (Nonaka and Bril 2012). 
Evidence was found that the movements of the two hands were func- 
tionally linked, reflecting the roles assumed by each hand. However, the 
dynamics of bimanual movement exhibited more stable and determinis- 
tic coupling with high level experts, although, at the same time, their 
hammering arm movements showed greater variability in amplitude and 
frequency (Biryukova and Bril 2008; Nonaka and Bril 2012). This appar- 
ent inconsistency suggests that the observed deterministic structure of the 
bimanual dynamics does not stem from the stereotypy of the hammering 
arm movement. Furthermore, the bimanual coordination is embedded in 
the context of the task function. In other words, more so for high level 
experts, the dynamics of bimanual coordination reflect the functional 
demands of the task—i.e., type of subgoal (see Fig. 1)—with the 
dynamics of bimanual coordination being more stable and less noisy for 
more demanding tasks (Nonaka and Bril 2012). In short, it can be 
hypothesized that, although the two arms have quite different functions, 
with increasing levels of skill they become more and more functionally 
linked, thus, reducing the number of parameters to be controlled 
(Biryukova and Bril 2008, Jancke 2006). 


8 — Across recurrence quantification analysis (Shockley 2012) was performed on the 


time series data of the two hand movements for two different subgoals, neces- 
sary for the fabrication of a bead of ellipsoidal shape. 
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Fig. 3. 

From action to movement in 
the case of stone knapping. 
With the exception of the exte- 
rior platform angle, all the 
parameters, in some way or 
another, have to be controlled 
in any percussive task. Only 
movement parameters are 
recorded and allow for compu- 
tation of regulatory and control 
parameters (adapted from Bril 
2018; Bril et al. 2012). 
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A functional model 


A summary of the functional approach to the knapping technique 
reviewed in the present paper is presented in Figure 3. This model is 
grounded in the mechanical constraints of the conchoidal fracture. These 
constraints are imposed on knapping behavior and prompt the knapper’s 
behavior. In other words, the conchoidal fracture mechanics dictate the 
behavior of the knapper, whether a human or a robot! Knapping skill or 
flaking skill—more restrictively here as the technique is considered— 
may be defined as the capacity to respond as satisfactorily as possible to 
the goal the actor is engaged in, regardless of the conditions—i.e., pro- 
ducing a flake with the desired characteristics in terms of size, thickness, 
raw material and environmental conditions—so as to fully participate in 
the sequence of flaking defined by the overall goal of the knapper. 

The knapping behavior is presented under the three levels discussed 
earlier: (1) the functional or control parameters, (2) the parameters of 
strategy, and (3) the movement parameters. While the functional or con- 
trol parameters (the WHAT) are imposed by the task (the kind of flake to 
be produced), the parameters of strategy of action and strategy of move- 
ment (the HOW) vary importantly among knappers. From a methodolog- 
ical point of view, it is important here to remember that only the motion 
of material objects can be directly recorded. Hence, the different param- 
eters defining the knapper’s motor behavior will be computed from the 
recording of a few specific parts of the body and the tool (the hammer) 
based on a model of the body. The chosen model of the body will depend 
on the intended focus of the analysis; although being a biomechanical 
issue, this has to be carefully considered (Biryukova et al. 2000; Hogan 
1985). 

While the present discussion is restricted to the motor behavior, it has 
to be completed by an analysis of the different sensory means (visual, 
proprioceptive, kinesthetic, tactile, and even vestibular), which are nec- 
essary to inform the actor about the state of the whole system. Through- 
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out his work, Bernstein (1967, 1996) continuously refers to the impor- 
tance of sensory information, necessary to insure the control of motor 
action—this issue falls beyond the present discussion and needs further 
development. However, this functional model offers an efficient way to 
describe goal-directed behavior by differentiating behavioral organiza- 
tional levels. 


CONCLUSION: THE NEED TO DIFFERENTIATE TECHNIQUE FROM 
METHOD FOR UNDERSTANDING STONE KNAPPING SKILLS 


Reviewing a series of experimentations on stone knapping behavior 
developed over 25 years, this paper reveals how a well-designed descrip- 
tion of behavior, based on the functionality of action, supports a better 
understanding of skilled goal-directed behavior. We argue here that dis- 
tinguishing the functional constraints exclusive to the task and the multi- 
ple potential ways to behaviorally satisfy these constraints allows for a 
better understanding of knapping skills and learning. In other words, 
once the goal is specified, these functional constraints are given and must 
be fulfilled, as long as different ways to solve the task exist. 

Indeed, in the knapping literature, a lack of “well developed models 
of technical cognition” (Wynn and Coolidge 2017) or technical thinking 
(Malafouris 2021) is often mentioned and unfolds in questionings such 
as: “How do the knapper’s intention, perception and action relate?” or 
“Where does the ‘thinking’ stop and the ‘flaking’ begin?” (Malafouris, 
2021: 107-108). Assuming that these issues are relevant, I suggest that 
these questions cannot be answered without a functional description of 
the task. It is often assumed that “technical thinking” is based on various 
cognitive components often referred to as haptic perception, spatial cog- 
nition, long term or working memory, semantic knowledge, etc. (Wynn 
and Coolidge 2017). Nevertheless, these are not specific to stone knap- 
ping. 

Most of the time, studies about stone knapping skills are based on the 
distinction between the cognitive and the sensori-motor dimensions—the 
emphasis being on the cognitive component (Pargeter et al. 2020). This is 
especially noticeable when considering the long-accepted distinction 
between “abstract knowledge” (connaissance) and practical know-how 
(savoir-faire), introduced long ago by J. Pelegrin (1991, 2005)—-with the 
sensori-motor component receiving much less interest, being often con- 
sidered a “biomechanical question”, a quite basic action involving little 
cognition. However, quite recently, Pargeter et al. (2020) restated the 
importance of understanding motor coordination and control over knap- 
ping. 

Indeed, the missing part of most studies could well be the absence of 
in-depth descriptions of what knapping is and, more specifically, what 
flaking stands for. Whereas the mechanical principles of the conchoidal 
fracture have been well defined (Dibble 1997; Dibble et al. 1995; Li et al. 
2022; Pelegrin 2005), the knapping activity in terms of “behavior” has 
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not been considered in depth. In particular, the manner in which the 
knapper handles the mechanical properties of the conchoidal fracture 
needs to be addressed in greater depth (Li et al. 2022). 

In this review, I hope to have substantiated the benefit to clearly dif- 
ferentiate the method from the technique. The technique, referring to the 
physical modalities of the production of a flake, appears as the minimal 
unit of action. While it is usually accepted that “Oldowan-level flaking 
proficiency can be achieved by modern knappers within just a few hours 
of practice” (Pargeter et al. 2020: 4), the results discussed in this review 
show that a task such as flaking, usually presented as simple and unelabo- 
rated, is of substantial complexity. A fine mastery of flaking requires 
years of practice, which turns out to be the prerequisite for successful 
implementation of the method. 

Our approach to goal-directed action may be qualified as “bottom up” 
as opposed to the long lasting “top-down” mainstream view, which 
attributes the brain as the major controller of action (Biryukova and 
Sirotkina 2020). It is assumed that the mechanical requirements neces- 
sary to produce a conchoidal fracture impose on the actor the production 
of the adequate quantity of energy (kinetic energy at contact) that yields a 
well-defined flake. On the basis of these assumptions, it is assumed that 
what has to be learned is not a movement but the capacity to finely tune 
this kinetic energy, that is, to rightly choose the weight of the hammer 
and produce the right velocity vector at the point of impact of the hammer 
on the stone. 

Due to the huge number of degrees of freedom of the human organ- 
ism (Bernstein 1967, 1996; Latash 2012), understanding behavior is no 
easy task. Indeed, many ways are possible to reach a goal. The present 
review has shown that this is the case at the behavioral level, the move- 
ment being specific to each knapper. Furthermore, as Krakauer and col- 
leagues stress in their paper titled “Neuroscience needs behavior” (2017), 
this is the case as well when considering the neural level where multiple 
possible patterns of activity may engender a single natural behavior. Con- 
versely, a single pattern of brain activity can map with multiple natural 
behaviors. In a similar way, resulting movements can be achieved by dif- 
ferent muscle coordination patterns (Bernstein 1967). 

As underlined by Krakauer and colleagues in their plea to enlarge in- 
depth studies of behavior, “the neural basis of behavior cannot be prop- 
erly characterized without first allowing for independent detailed study 
of the behavior itself’ (2017: 488). I hope that the studies on stone knap- 
ping reviewed here provide a good illustration of the critical importance 
of developing conceptual frameworks based on “bottom-up” models 
(Biryukova and Sirotkina 2020) for understanding complex real-life 
behavior. 
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CHAPTER FIVE 


Ape knapping then and now: Limited social learn- 
ing of sharp stone-tool making and use in naive 
non-human apes 


Alba Motes-Rodrigo!’, Claudio Tennie”? 


Abstract 


Ape knapping experiments complement human knapping experiments as a 
source of behavioral data to build hypotheses about the learning mechanisms 
underlying the acquisition of knapping skills in extinct hominins. In addition, ape 
knapping experiments provide information regarding the stone-related behaviors 
that could have preceded the systematic production and use of sharp stones in 
our lineage. In this chapter, we review previous ape knapping experiments with a 
focus on those that tested apes’ abilities to socially learn from human demon- 
strators. Two studies, investigating one orangutan and one bonobo, concluded 
that both apes could socially learn sharp stone tool-making and use from human 
demonstrations. These results were interpreted as evidence of the reliance of 
early hominins on social learning to acquire knapping skills. However, alternative 
explanations exist. We provide novel data from two experiments investigating the 
abilities of the two previously untested great ape species (chimpanzees and goril- 
las) to learn knapping from human demonstrations. Contrary to the previous 
studies, the chimpanzees and gorillas in our experiments did not acquire sharp 
stone tool-making or use socially from human demonstrations. However, the apes 
we tested frequently manipulated the testing materials and two chimpanzees 
engaged in two events of lithic percussion involving an active hammer (although 
these actions did not lead to flake detachment). Our results suggest that the 
observation of human demonstrations is insufficient for the tested apes to 
acquire knapping abilities. This disparity in results between studies is unlikely to 
be explained by species differences in tool-use proficiency but rather by the par- 
ticular rearing background of the previously tested individuals. We discuss how 
our previous results on both the individual and social learning abilities of unencul- 
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turated, untrained orangutans compare to our new results on gorillas and chim- 
panzees. In addition, we comment on the general implications of ape knapping 
experiments for understanding the likely origin and maintenance of knapping 
skills in pre-modern hominins. 


INTRODUCTION 


The production and use of sharp stone tools are often considered one of 
the most important innovations in human evolution. The use of sharp 
stone tools allowed our ancestors to access new food sources and engage 
in a variety of newly available foraging behaviors such as butchering big 
prey or plant processing techniques (Keeley and Toth 1981; Potts and 
Shipman 1981). Despite the abundance of sharp stone tools in the archeo- 
logical record, these artifacts on their own are silent regarding the learn- 
ing process that naive hominins underwent to acquire the skills needed 
for their production and use. Several research avenues, usually involving 
comparative models, can be pursued in order to investigate this question. 
One such avenue is to test how modern humans in experimental archae- 
ological studies learn to make and use sharp stone tools (Morgan et al. 
2015; Nonaka et al. 2010; Pargeter et al. 2019; Stout et al. 2015; Stout 
and Semaw 2006). This approach has a long history and presents multi- 
ple advantages (Bordes 1969; Eren et al. 2016; Toth 1982, 1985). From 
a practical perspective, human participants can communicate their per- 
ception of their thought processes (e.g., why they are choosing a particu- 
lar striking platform or hammerstone) as well as receive instructions 
before and during the experiments (Morgan et al. 2015; Nonaka et al. 
2010). From a theoretical point of view, modern humans are the extant 
species phylogenetically closest to early hominins, potentially allowing 
to build behavioral and cognitive models of earlier hominin species 
based on modern human behavior (Stout and Semaw 2006; Toth 1985). 
However, knapping experiments with humans also have limitations. 
First, it is often difficult to ensure the naivety of the human subjects to 
knapping (as they might have seen movies or museum artifacts) and thus 
it is difficult to disentangle the origin of knapping skills if they are 
expressed during the experiment (though see Snyder et al. 2022 for a 
method used to overcome this limitation using post-test questionnaires 
and baseline testing). A counterargument to this limitation is that pre- 
vious knowledge of the task (if any) would most probably involve the 
final product rather than the knapping technique, meaning that partici- 
pants would likely be naive to the production process (though even here, 
reverse engineering of technique is a possibility and essentially how most 
lithic analysis must be conducted today given that the production pro- 
cesses are unobservable). The second limitation is that it is often prob- 
lematic to ensure that other skills in the participant’s repertoire are not 
being brought to and repurposed for the knapping task (“indirect know- 
how”; e.g., familiarity with how to break glass, Snyder et al. 2022). 
However, other behaviors in the early hominin repertoire (e.g., stone 
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throwing, Kühl et al. 2016) could have also preceded and influenced 
knapping in extinct hominins (Carvalho et al. 2008; Panger et al. 2002), 
although different learning mechanisms could have underlied their acqui- 
sition. The third limitation is that despite being phylogenetically closer to 
hominins than any other extant primate, the phylogenetic proximity 
between modern humans and hominins decreases as we explore earlier 
species. This decrease in relatedness could therefore undermine the use 
of modern humans as models of earlier hominin species. The fourth lim- 
itation of knapping experiments with modern humans is that most mod- 
ern human populations currently inhabit an environment very dissimilar 
from the one inhabited by our early ancestors (Faith et al. 2019). In 
addition, many modern humans have more sedentary lifestyles compared 
to the nomadic hunter-gathering lifestyle of early hominins (Marlowe 
2005). However, such different living conditions between modern and 
extinct humans could actually be advantageous for experimenters trying 
to control for previous experience in human knapping studies, as the pro- 
duction and use of stone tools in most modern human populations is rare. 

In parallel to human knapping experiments, a research avenue was 
initiated in the 1970s to investigate the stone tool-making and -using 
abilities of non-human primates under controlled conditions. Primate 
knapping studies have the advantage that it is relatively easy to ensure the 
naivety of the subjects regarding sharp stone tool-making and -use. Most 
captive-born primates have been under long-term monitoring and their 
previous tool-making and -using experiences are known. Primate studies 
(particularly those involving the genus Pan) can be informative regard- 
ing the stone tool repertoires and abilities of older hominin and hominoid 
species closer to the last common ancestors than our species (Bandini et 
al. 2022; Carvalho and McGrew 2012; McGrew 2010; Wynn et al. 2011). 
However, as we will see, primate knapping studies are not free of limita- 
tions either. 

Prior to our studies, two research projects investigated the abilities of 
non-human great apes (henceforth apes) to make and use sharp stone 
tools (several bonobos, Pan paniscus, and one orangutan, Pongo pyg- 
maeus; Toth et al. 1993, 2006; Wright 1972). The goal of these projects 
was to assess if great apes could acquire sharp stone tool-making and 
using abilities via the observation of human demonstrations, and if they 
could, to evaluate the extent of these abilities. The first ape knapping 
experiment was conducted by Wright (1972) who tested a juvenile male 
orangutan. The orangutan in Wright’s study (Abang) was provided with a 
stabilized and pre-shaped flint core fixed on a wooden platform, as well 
as a hammerstone and an opaque puzzle box baited with food. The lid of 
the box was kept closed by a rope, which needed to be cut with a sharp 
object in order to open the lid and thus access the food rewards contained 
inside the box. This puzzle box represented an indirect task in which the 
action of cutting with a sharp object did not grant immediate access to the 
food rewards but instead allowed a door to open through which food 
could be obtained. In a first experiment, Abang was given demonstra- 
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tions of how to use a human-made sharp-edged stone as a cutting tool. In 
addition, Abang experienced at least one instance of molding where the 
keeper guided Abang’s hand to sever the rope with the stone. These dem- 
onstrations and molding were followed by test trials where the orangutan 
could explore and manipulate the testing materials (i.e., human-made 
sharp-stones and puzzle box). After nine demonstrations, Abang used a 
human-made sharp stone as a cutting tool to sever the rope keeping the 
puzzle box closed. In a second experiment, Wright demonstrated to 
Abang how to produce sharp stones by using a hand-held hammer to 
strike on a hand-held core. This knapping technique is known as “free- 
hand knapping” and it was not actually available to the orangutan during 
the trials because the core was fixed on a board (see Fig. 1). During the 
10th testing day and after 16 demonstrations, Abang produced his own 
sharp-edged stones by striking a hand-held hammerstone against the 
fixed core. Abang then proceeded to use one of these sharp-edged stones 
as a cutting tool to severe the lock-rope, open the baited box and obtain 
the food rewards (Wright 1972;  https://www.youtube.com/ 
watch?v=3exAOxSKY CE). 

In the 1990s, Schick, Toth and colleagues similarly tested the lan- 
guage-trained bonobo Kanzi on his abilities to use and produce stone 
tools after being exposed to demonstrations by a human model (Toth et 
al. 1993). Additionally, the researchers investigated the development of 
these skills over a period of several years (Schick et al. 1999; Toth et al. 
2006). In the first set of experiments, Kanzi was provided with human 
demonstrations (before the start of the tests) on how to produce sharp- 
edged stones using the freehand knapping technique described above as 
well as on how to use these sharp-edged stones as cutting tools to sever a 
rope (Toth et al. 1993). After having been exposed to these demonstra- 
tions, Kanzi was given hammerstones, loose stones of different sizes and 
raw materials (although eventually, only chert cobbles were used in the 
tests), and a puzzle box with a rope-lock similar to the one used by 
Wright baited with food (see above). In later experiments, Kanzi was also 
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provided with a second “drum-like” puzzle box. This second box con- 
sisted of a wooden cylinder covered by a transparent plastic lid that 
needed to be cut in order to (directly) access the food placed inside the 
cylinder. On the first day of the first stone tool-use experiment, after hav- 
ing observed human demonstrations of stone tool-production and -use, 
Kanzi started to use human-made flakes as cutting tools to sever the rope- 
lock of the baited puzzle box. In a follow-up experiment, Kanzi’s abilities 
to produce sharp-edged stone tools after observing human demonstra- 
tions of the freehand knapping technique were evaluated. After a month 
of trials and several unsuccessful attempts at producing sharp-edged 
stones using the freehand knapping technique, Kanzi developed his own 
technique to initiate stone fracture where he threw a cobble against a hard 
surface (“throwing technique,” Fig. 1). This innovative solution was not 
the type of solution originally intended by the experimenters and so 
attempts were made to discourage this behavior by moving the experi- 
ments to an outdoor enclosure with soft ground. Additionally, he also 
innovated a variant of the throwing technique where he threw a cobble 
against another stationary cobble (“directed throwing technique,” Fig. 1). 
However, the timing and context of this innovation is not described (Toth 
et al. 1993). Eventually, Kanzi also successfully performed the demon- 
strated freehand technique. In later years, Kanzi preferably used the two 
throwing techniques over the freehand knapping technique demonstrated 
to him by humans (Schick et al. 1999). Kanzi’s half-sister Panbanisha 
was later reported to have learnt to use and produce sharp-edged stone 
tools via freehand percussion after observing a human demonstrator 
(Savage-Rumbaugh and Fields 2006). Similarly, Panbanisha’s two sons 
were also reported to have acquired sharp-edged stone-making and - 
using skills via the observation of the two more experienced bonobos 
(Kanzi and Panbanisha), although neither the learning process nor the 
bonobos’ knapping skills were tested nor described in detail (Toth et al. 
2006) and should therefore be considered with caution. 

Despite being highly innovative at the time, the ape knapping studies 
described above present a series of methodological limitations in light of 
recent developments in animal cognition research. Specifically, all indi- 
viduals tested in these previous studies were highly- or at least semi- 
enculturated, meaning that they were partly raised in a human cultural 
environment that included extensive direct human contact and intentional 
training (Furlong et al. 2008; Henrich and Tennie 2017). This is problem- 
atic for making inferences based on the results of these earlier studies. 
Enculturated apes cannot be considered representative models of their 
wild counterparts, nor of their last common ancestors with the hominin 
lineage (Henrich and Tennie 2017). Indeed, it is widely accepted that one 
of the tested apes (the bonobo Kanzi) represents one of the most extreme 
cases of such human enculturation (Savage-Rumbaugh et al. 1986). As 
for Abang the orangutan, a former ape keeper from the zoo where the 
experiments took place told us (AMR) in 2019 that it was common prac- 
tice in the 1960s and 1970s for the keepers in Bristol Zoo (UK, where 
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Abang was located at the time) to enter the enclosure with the orangutans 
as well as to take them for walks around the zoo grounds. Such close con- 
tacts and frequent human interactions (forbidden in most zoos today) 
suggest that Abang the orangutan was at least semi-enculturated (Henrich 
and Tennie 2017). In Abang’s case, there is the additional limitation that 
molding took place during the first experiment exploring sharp stone 
tool-use. Abang’s keeper physically guided Abang’s hands to sever the 
rope-lock using a sharp stone. This instance of molding confounds the 
results of Abang’s learning process of sharp stone tool-use (though note 
that Wright did not report any molding taking place during or before the 
sharp-stone tool-making experiments). Relatedly, cobbles were actively 
placed into Kanzi’s hands to promote stone tool-making (Savage-Rum- 
baugh and Fields 2006). Finally, the reduced sample sizes (N=1) in both 
studies (Toth et al. 1993; Wright 1972) leave open the question of the 
generalizability of their results to other ape subjects of the same and other 
species (though some limited data exist for three more bonobos, see 
above). 

In the present chapter, we expand on the findings of these early 
experiments, first, by testing the two remaining species of great apes, 
namely chimpanzees and gorillas, and, second, by addressing some of the 
limitations and confounds in these early ape experiments. We investi- 
gated—for the first time—whether group-housed chimpanzees and goril- 
las can acquire sharp stone tool-making and -using abilities via the obser- 
vation of human demonstrations. We tested a group of chimpanzees 
including individuals both with limited levels of enculturation (semi- 
enculturated) and unenculturated as well as a group of unenculturated 
gorillas. Here, we used the term semi-enculturated as defined by Henrich 
and Tennie (2017) to refer to apes that were hand-reared while they were 
young but then lived most of their lives in a conspecific group at a zoo- 
logical institution. By unenculturated individuals we refer to captive 
individuals which have always lived in conspecific groups, are mother- 
reared and have not received extensive human training (e.g., except for 
veterinary procedures). We ensured the naivety of the test subjects to the 
target tasks by interviewing the ape keepers and confirming the lack of 
previous stone tool-making and -using experience of the apes both within 
and outside previous experiments. If naive, unenculturated, untrained 
chimpanzees and gorillas would develop sharp stone tool-making and - 
using abilities following human demonstrations (but not on their own), 
this would mean that the results of Wright and Toth et al. might be gener- 
alizable to all great ape species. In turn, such findings would suggest that 
human demonstrations suffice for the acquisition of these abilities even 
in naive, untrained, unenculturated individuals. Such results would also 
provide support for the hypothesis that the earliest stone tool-using homi- 
nins learnt stone tool-making and -using abilities via the observation of 
other individuals’ behavior. If naive, unculturated and untrained chim- 
panzees and gorillas would not express the target behaviors after observ- 
ing human demonstrations, this would suggest that human training 
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and/or enculturation, perhaps in addition to human demonstrations, are 
required for the acquisition of these behaviors in these species (compare 
Bandini, Motes-Rodrigo et al. 2021, 2022). Such findings would suggest 
that the background and training of the bonobos and orangutans tested in 
the previous ape experiments had a strong influence on the positive find- 
ings observed (causing enhanced innovativeness and/or increased social 
learning abilities). 


MATERIALS AND METHODS 
Subjects and housing 


The subjects of the experiments presented here were gorillas 
(Nadults = 2F and 2M; Niuvenile = IM; Ninfant = 1M; mean age = 23.5 
years + 16) and chimpanzees (Naquyits = 7F and 6M; mean age = 33 
years + 11) housed at Twycross Zoo (Atherstone, UK). The chimpanzees 
were housed in a group that included seven human-reared, three mother- 
reared and three individuals of unknown rearing. The gorillas were 
housed in a group that included four mother-reared individuals and two 
individuals of unknown rearing. All these apes had access to indoor and 
outdoor enclosures as well as to quarters off-sight from the visitors. All 
ape indoor enclosures were equipped with environmental enrichment 
such as climbing frames, bedding materials, platforms and containers 
where food could be placed for the apes to retrieve. The floor of the 
indoor enclosures was covered with wooden chips and straw. The apes’ 
outdoor enclosures consisted of grassed areas surrounded by glass walls 
from where visitors could observe the apes. The outdoor enclosures 
included climbing frames and huts. Feedings took place several times a 
day when food (fruit, vegetables, primate pellets and nuts) was scattered 
in the indoor and outdoor enclosures. Food was often placed inside 
enrichment devices such as hanging balls and boxes attached to the 
meshes. Water was available ad libitum in all enclosures. The experi- 
ments took place in the off-sight quarters connected to the indoor enclo- 
sures. During the experiments, all apes could access the off-sight quarters 
as well as both the indoor and outdoor enclosures as a group. 


Testing materials 


Both during the demonstrations and the tests we used two puzzle boxes 
(the tendon box and the hide box), three artificial hammerstones fixed to 
the enclosure bars using chains and a fixed chert core. The tendon box 
was modeled on an earlier version described by Wright (1972) and Toth 
et al. (1993) and consisted of two opaque boxes secured to a wooden 
board (Fig. 2). The tendon box had a clear Plexiglas window (5 cm x 
16 cm) at the top that allowed the apes to see the reward inside one of the 
boxes. The door of the reward box was pulled shut by a rope that ran 
through the inside and exited through a hole in the opposite end. The rope 
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Fig. 2. 
Illustration of the tendon box 
and its opening mechanism 

when using a flake to severe 
the rope lock. Illustration by 


Nuria Melisa Morales Garcia. 
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then ran between the two boxes for approximately 5cm and entered the 
second (non-rewarded) box. Thus, the rope was only accessible in the 
area between the two boxes and had to be cut there to allow the door of 
the reward box to open. 

The hide box was designed based on an apparatus used to test capu- 
chin monkeys (Cebus apella) (Westergaard and Suomi 1994) and con- 
sisted of a transparent Plexiglass cylinder (16 cm wide x 15.5 cm high) 
with a metallic rim (Fig. 3). A silicone membrane 2 mm thick was 
screwed in between the cylinder and the rim, blocking the access to the 
reward placed inside the cylinder. Three artificial hammers were made 


Tendon-like rope Reward 


1 The stone tool is used to cut the rope 


im 


2 The tension in the rope is released and the door opens 
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out of concrete and used during demonstration tests (small: 12 cm length 
x 9 cm width, 2 kg; medium: 15 cm length x 10 cm width, 2.5 kg; large: 
18 cm length x 11 cm width, 3 kg). The hammers had an overall potato 
shape and were built around a metallic scaffold linked to a chain that 
allowed fixing the hammers to a wooden platform. The hammers were 
tested a priori on cores equivalent to the ones used in the experiments to 
ensure that they allowed flake removal from the cores. The concrete used 
to build the hammers included particles of up to 0.5 cm in diameter 
(Fig. 3). Retouched Norfolk Chert cores were used for the demonstra- 
tions and tests. The cores were modified to display angle variability 
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Fig. 3. 

Top panel: The three (chained) 
hammerstones and the chert 
core used during the experi- 
ments fixed on the metallic 
platform with a metallic mesh. 
Bottom panel: The two puzzle 
boxes (drum box and tendon 
box) used during the experi- 
ments. All materials were 
mounted onto two wooden 
platforms with a metallic frame 
that allowed us to fix the mate- 
rials to the walls of the testing 
quarters. Pictures by Alba 
Motes-Rodrigo. 


105 


106 


Motes-Rodrigo, Tennie 


between ~90 degrees and ~40 degrees. Different cores were used for the 
demonstrations and the tests. If a core was not modified during a test, the 
core was used in further tests. Due to safety regulations, the core had to 
be fixed on a metallic platform (20 cm x 20 cm x 2 cm) to prevent the 
apes from carrying the core into the indoor enclosure (similar to Wright 
1972). The core was attached to the platform using a metallic wired mesh 
with a hole width of 50 mm and wire diameter of 3 mm from XTEND 
(Carl Stahl ARC GmbH, Architectural Cables and Mesh Systems). This 
fixing system left exposed a portion of the core (Fig. 3). The core was 
attached to the platform ensuring that the acute angle from which flakes 
could be detached was facing up. 


Experimental procedure 


Demonstrations to all apes were made in a group setting. The demonstra- 
tions to the gorillas took place through a glass wall in the indoor enclo- 
sure while the sleeping quarters were being cleaned and before the zoo 
opened to the public. The demonstrations to the chimpanzees took place 
through a mesh in the service aisle in front of the off-sight quarters where 
the tests took place and which the chimpanzees were free to access out- 
side of cleaning hours (see Fig. 10 in Neadle et al. 2020). An individual 
was considered to have observed a demonstration when his/her head was 
oriented towards the demonstrator (AMR) during the entire demonstra- 
tion. If the individual moved away or stopped looking during the demon- 
stration, the demonstrator stopped and started again from the beginning 
once the individual was again paying attention. A spreadsheet of which 
individuals had observed which demonstrations was continuously 
updated after each demonstration (this was necessary to ensure that the 
chimpanzees saw a minimum of three demonstrations before their first 
test and to count how many demonstrations each gorilla saw, see below). 
The identity of the individuals that observed each demonstration was 
confirmed by the keepers present during the demonstrations. If the 
experimenter was not sure if an individual had seen a full demonstration, 
it was assumed that he/she had not and the demonstration was repeated. 
During all demonstrations, the wooden platforms where the testing mate- 
rials were fixed were placed on the floor between the apes and the dem- 
onstrator, so the actions of the demonstrator were clearly visible from 
where the apes were. 

Each demonstration consisted of the production of one flake by strik- 
ing the stabilized core on the fixing platform with one of the artificial 
hammers. This was followed by the use of the produced flake to open one 
of the puzzle boxes and obtain the food reward. This knapping technique 
was employed in order to demonstrate to the apes the flake production 
method that later was going to be available to them during the tests 
(unlike Wright 1972). Only one flake was made in each demonstration 
and flakes were not reused between demonstrations. After detaching a 
flake, the demonstrator held it in front of the observing apes to ensure 
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that he/she had seen the flake. The subsequent demonstrations of flake 
use did not start until all apes present had seen the flake (i.e., their head 
was oriented towards the demonstrator while she was holding the flake). 
Demonstrations of flake use were conducted with both puzzle boxes (see 
below). When demonstrating how to open the tendon box, the demon- 
strator used the flake she had produced immediately before to cut the 
rope that kept the door of the box closed. When demonstrating how to 
open the hide box, the demonstrator used a flake she had produced imme- 
diately before to cut through a plastic sheet placed in the same position as 
the silicone membrane would be placed during the actual tests. We used 
plastic sheets instead of silicone membranes during the demonstrations 
due to the limited availability of silicone membranes and the high 
number of demonstrations. When obtaining the reward, the demonstrator 
made sure that the ape saw it by taking the reward out of the box and 
showing it to the observing apes. After each demonstration, the boxes 
were re-baited with the same reward and closed. Cores used during the 
demonstrations were exchanged for new cores before the apes had access 
to the testing materials. 

The demonstrations involved all possible combinations (N=9) 
between hand (left, right, both) and hammer type (small, middle, large). 
Each of the nine combinations was demonstrated twice per test box 
(3 hand combinations x 3 hammers x 2 boxes x 2 rounds of demonstra- 
tions = 36 demonstrations) before the start of Test 1. As all demonstra- 
tions had to be made in the presence of a keeper to comply with zoo reg- 
ulations, each round of demonstrations was spread over at least two days, 
depending on the keeper’s availability. A maximum of two cores were 
used per demonstration day and these were exchanged when their knap- 
pable surfaces were exhausted. 

For the gorillas, the experiments were structured into the following 
phases: 

Initial Demonstrations (N=36) - Test 1 - Test 2 - Repeated Demon- 
strations (N=12) - Test 3 - Test 4. Each gorilla saw at least six demonstra- 
tions before the first test. 

For the chimpanzees, the experiments were structured into the fol- 
lowing phases: 

Initial Demonstrations (N=36) - Test 1 - Test 2 - Repeated Demon- 
strations (N=18) - Test 3 - Test 4 - Test 5 - Test 6. As some of the chim- 
panzees rarely entered the off-sight quarters where demonstrations were 
taking place, we proceeded to Test 1 when at least 80% of the chimpan- 
zees had seen a minimum of three demonstrations (Table 1). Tests 5 and 6 
were implemented given the results of Tests | and 3. 

Repeated demonstrations were implemented after Test 2 to account 
for any potential effects of the delay since the initial demonstrations. 
These demonstrations were meant to act as reminders of the task and 
solutions. Each test took place on a consecutive day and the testing mate- 
rials were only available to the apes during the tests. Apes participated in 
a maximum of one test per day, which lasted for several hours (between 2 
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between Tests 2 and 3 as a small CH2 1 repeated 
reminder (repeated). 
medium CH2 2 repeated 
large CH2 3 repeated 
small CH3 6 initial 
medium CH3 6 initial 
large CH3 6 initial 
small CH3 3 repeated 
large CH3 3 repeated 
medium CH3 1 repeated 
small CH4 3 initial 
small CH5 3 initial 
large CH5 1 repeated 
small CH6 6 initial 
medium CH6 2 initial 
medium CH6 2 repeated 
small CH7 7 initial 
medium CH7 4 initial 
small CH7 2 repeated 
medium CH7 3 repeated 
large CH7 2 repeated 
small CH8 2 initial 
medium CH8 1 initial 
medium CH8 3 repeated 
large CH8 4 repeated 
small CH9 1 repeated 
small CH10 1 repeated 
large CH12 3 initial 
small CH12 3 initial 
small CH11 3 initial 
medium CH11 2 repeated 
large CH11 1 repeated 
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and 3 hours in the case of chimpanzees and approximately 2h in the case 
of gorillas) on a voluntary basis. The testing materials (including the 
baited boxes) were placed inside the testing quarter before each test as 
described above (Fig. 3) and all individuals were free to participate. All 
demonstrations and tests were recorded with two Sony HDR-CX330E 
Handycams video cameras. From the video recordings of each test, the 
demonstrator later coded all active interactions that the chimpanzees and 
the gorillas performed with the testing materials. An interaction started 
when the ape entered in physical contact with the testing materials and 
finished when the contact a) ceased, b) paused for more than three sec- 
onds or c) the interaction type changed (the part of the materials explored 
changed). We only considered active interactions, meaning that station- 
ary contact (such as placing and resting a hand on the materials, sitting or 
lying down on the materials) was not coded. From each interaction we 
coded: 1) the identity of the individual; 2) the testing material that the 
individual interacted with (core, flake, hammer, hide box, tendon box); 3) 
if the interaction took place manually, using the mouth or a tool; 4) the 
type of tool and 5) the duration of the interaction. 


RESULTS 
Chimpanzees 


Twelve out of the 13 chimpanzees tested interacted at least once with the 
materials during the tests. The total number of interactions was 1025. The 
total number of interactions per individual varied from two to 199 (mean 
number of interactions + SD= 85 + 65) and the number of interactions per 
test varied from 473 during Test 1 to 68 during Test 5 (Fig. 4). The chim- 
panzees interacted the most with the baited boxes (hide and tendon box, 
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Fig. 5. 

Frequencies of interaction with 
the different materials across 
tests performed by the chim- 
panzees. 
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Fig. 5). One of the chimpanzees learnt to open the hide box manually by 
pulling on the edge of the silicone membrane and did so in several tests. 
The manual opening of the box sometimes took place when other indi- 
viduals were present in the testing quarter. No other chimpanzee was suc- 
cessful in opening any of the baited puzzle boxes. Despite being empty 
after it had been manually opened, most chimpanzees interacted with the 
hide box (including the individual that learnt how to open it manually, 
Fig. 5), suggesting that the food rewards were not the only motivator of 
the chimpanzees’ exploratory behavior. 

Most of the interactions performed by the chimpanzees with the test- 
ing materials were manual (N=970), although the chimpanzees also used 
their mouths to try to open the boxes (N=5) as well as several tools that 
they brought from the indoor enclosure (N=50). The chimpanzees per- 
formed 20 interactions using sticks and five using pieces of straw 
obtained from the indoor enclosure. The nature of the tools used in the 
remaining 25 interactions could not be identified from video recordings. 

Twelve chimpanzees touched the rope of the tendon box a total of 52 
times. Of these, only two individuals (CH3 and CH11) interacted with 
the rope more than five times (13 and ten times, respectively). Most inter- 
actions with the rope took place by hand (N=45) and using the teeth 
(N=5), but on two occasions a piece of straw was used to apparently try 
to (unsuccessfully) break the rope. The straw pieces were looped around 
the rope of the tendon box and used to pull the rope upwards. 

The chimpanzees interacted with the hammers a total of 155 times. Of 
these interactions, 12 involved percussion, defined as the use of tools to 
strike surfaces or objects (Whiten et al. 2009). On eight of these occa- 
sions, the chimpanzees used a hammer to hit the wooden platform hold- 
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ing the core and hammers (CH2: M, 33 years old at the time of testing, 
human-reared; CH7: F, 31, human-reared; CH12: M, 40, rearing 
unknown). On four of these occasions, one chimpanzee used a hammer 
to hit another hammer (CH12). Two chimpanzees also forcefully rolled 
the hammers on their side without lifting them (N=6; CH2 and CH7), 
which caused a hammer to impact with the core. Eight times, two chim- 
panzees (CH7 and CH12) made some contact with the core with a (hand- 
held) hammer. Two of these interactions involved percussion. CH7 was 
the first individual to use the small hammer to hit the middle section of 
the core once during Test 1. During Test 3, CH12 struck four times the 
core with the middle-sized hammer targeting the mesh-covered section 
of the core (Fig. 6). None of these interactions led to flake detachment as 
the strength employed to strike the core was insufficient and the target of 
the percussive actions was not the exposed acute angle of the core. 


Fig. 6. 

CH12 striking the middle sec- 
tion of the core fixed on the 
wooden platform using the 
middle hammer with the right 
hand during Test 3. Magnified 
is the hammer striking the 
mesh-covered midsection of 
the core from which no flake 
could be detached. 


Gorillas 


No attempt at stone tool-making was ever observed in the gorillas. All six 
gorillas interacted at least once with the materials provided. The total 
number of interactions was 380. The total number of interactions per 
individual ranged from eleven to 208 and the number of interactions per 
test varied from 34 on Test 4 to 161 on Test 2 (Fig. 7). The gorillas inter- 
acted the most with the baited boxes (hide and tendon box, Fig. 8) fol- 
lowed by the hammers and lastly the core. 

Most of the interactions performed by the gorillas with the testing 
materials were manual (N=341), although the gorillas also used their 
mouths to seemingly try to open the boxes (N=19). In addition, the goril- 
las also used several sticks as probing tools to interact with the boxes that 
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Fig. 7. 

Number of interactions per- 
formed by the different gorillas 
in each test. N indicates the 
number of individuals that 
interacted with the testing 
materials in each test and the 
dots are color-coded by indi- 
vidual. F= adult female. 
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gorillas. 
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they brought from the indoor enclosure (N=20). Four of the gorillas 
touched the rope of the tendon box a total of 41 times (across subjects). 
Of these, only two individuals (the juvenile and the infant) interacted 
with the rope more than five times (22 and 17 times, respectively). Most 
interactions with the rope took place by hand (N=26) and using their teeth 
(N=14), and on one occasion a stick was pressed down against the rope. 
The gorillas did not engage in any type of percussive activity that 
involved the hammers. However, they did play with the hammers by rol- 
ling them on the platforms eleven times in total across individuals. In 
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Tests 1, 2 and 4, one adult female managed to open the hide box by hand 
by pulling on the silicone and ripping it. The manual opening of the box 
sometimes took place when other individuals were present in the testing 
quarter. Other individuals nevertheless explored the empty box 
frequently even though it was empty. No other gorillas were successful in 
opening any of the two puzzle boxes. 


DISCUSSION 


Two out of 13 tested chimpanzees (both of whom were potentially encul- 
turated) but no gorilla out of six tested used a hand-held artificial hammer 
to strike a fixed core in a variant of the bipolar technique. These striking 
actions performed by the chimpanzees took place after observing human 
demonstrations of how to make and use sharp stone tools. These two 
instances of percussion were extremely rare among the total of 1025 
interactions observed in this chimpanzee group. No sharp-edged stone 
detachment took place as a consequence of these two striking actions. 
This lack of stone detachment was probably caused by the use of insuffi- 
cient force by the chimpanzees. Additionally, in these two instances of 
percussion, the chimpanzees did not target the exposed area of the core 
that had suitable angles for knapping, but instead hit the center of the core 
(which was covered by a metal mesh). 

The lack of tool use in wild populations alone would not explain the 
negative results of our experiments with captive gorillas. Although goril- 
las generally do not use tools in the wild (though see Breuer et al. 2005), 
they have been reported to use tools with similar proficiency as other 
apes in captivity (Shumaker et al. 2011). On the other hand, chimpanzees 
are the great ape species with the most varied tool-use repertoire in the 
wild and the only species of ape (so far) that uses stones as tools in the 
wild (Biro et al. 2006; Boesch and Boesch 1984). Stone tool use (includ- 
ing percussion) in foraging contexts has been described in multiple pop- 
ulations of wild chimpanzees (Carvalho et al. 2008; Koops et al. 2010; 
Luncz et al. 2012) but this behavior is relatively rare or absent among 
captive, untrained individuals even when stones are provided as testing 
materials (Arroyo et al. 2016). A previous study investigating the sponta- 
neous abilities of chimpanzees to acquire sharp stone tool-making and 
using skills during baseline tests (in the absence of demonstrations) 
showed that captive, unenculturated chimpanzees from two different 
groups do not spontaneously engage in lithic percussion (Bandini, 
Motes-Rodrigo, et al. 2021). 

Although rare, the two observations of lithic percussion collected 
during the present study could indicate that the chimpanzees acquired 
some action information regarding bipolar knapping from the demonstra- 
tions (i.e., know-how copying). However, alternative explanations are 
also possible. Given the published literature on chimpanzees’ lack of 
spontaneous know-how copying (Motes-Rodrigo et al. 2021; Neadle et 
al. 2020; Tennie et al. 2012), the two chimpanzees in our study that 
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engaged in stone percussion could have socially learnt from the demon- 
strations not the know-how of the underlying behavior, but the target of 
percussion (know-where/what). The social learning mechanism underly- 
ing this acquisition of information would thus have been stimulus and/or 
local enhancement (acquisition of know-what and/or know-where). The 
use of the artificial hammers as active elements could be then explained 
by the fact that these were the only mobile materials available in the test- 
ing quarter (i.e., the know-what to use as hammers was a given). 

Of the two chimpanzees that performed percussive actions targeted 
towards the core, one of them (CH7) had been hand-reared by keepers as 
an infant and the other (CH12) had an “unknown background.” It is 
therefore possible that both (but at least one) of the two chimpanzees that 
performed percussive activities were somehow enculturated due to 
extensive exposure to human contact during an extended period of time 
(Furlong et al. 2008; Henrich and Tennie 2017). This is problematic for 
making phylogenetic and species-wide inferences. Extensive human 
contact has been shown to affect both brain connectivity and cognition— 
including motivation and/or ability to copy behavioral forms or action 
know-how (Pope et al. 2018; Tomasello et al. 1993; Tomasello and Call 
2004). Enculturated apes have been repeatedly shown to possess 
enhanced copying abilities compared to unenculturated apes (including 
know-how copying, Call 2001; Call and Tomasello 1996; Custance et al. 
1995). Indeed, the only two studies that claimed that great apes could 
acquire sharp-edged stone tool-making skills (by copying a human 
model) tested enculturated or semi-enculturated individuals who had 
extensive human training and contact throughout their lives (Toth et al. 
1993; Wright 1972). Therefore, enhanced attention towards humans 
and/or the enhancement/training of certain cognitive abilities (such as 
know-how copying, compare Heyes 2018) via enculturation could 
explain the positive results of earlier ape knapping studies (see introduc- 
tion) and the two observations of chimpanzee lithic percussion in the 
present study. However, if this were the case, it remains unanswered why 
the other hand-reared individuals in the group did not perform percussive 
activities after being exposed to demonstrations. A potential explanation 
for this latter observation could be that a combination of human exposure 
during rearing, a more neophilic personality or differences in motivation 
levels (Forss et al. 2020; Hopper et al. 2014) led only two individuals to 
perform percussive actions (generally, animal behavior remains to some 
degree stochastic, even after training and enculturation). Testing hand- 
reared (and especially enculturated) individuals also incurs some of the 
limitations of human knapping studies highlighted in the introduction 
section. For instance, it might be difficult to exclude that hand-reared 
and/or enculturated apes are naive to behaviors or materials of interest. In 
such cases, enculturated (but not unenculturated apes, Clay and Tennie 
2017) might apply a previously known behavior to a new context, mate- 
rial or task after seeing a demonstration (i.e., contextual imitation). 
However, the hand-reared individuals in our study were hand-reared at 
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zoological institutions and after interviewing the zoo keepers we could 
confirm their naivety towards the testing materials and target behaviors. 

Yet, even if we were to consider both chimpanzees that engaged in 
lithic percussion in this study as enculturated, our subjects and the sub- 
jects tested in previous ape knapping studies (possibly Abang, but cer- 
tainly Kanzi) would clearly differ in their degree of enculturation. Kanzi 
the bonobo is one of the most famous cases of an ape brought up in a 
human social environment and exposed to language experiments 
throughout his life. Kanzi is renowned for having received extensive 
training to communicate with humans using lexigrams (Savage-Rum- 
baugh et al. 1986) and was accustomed to being in the same room as 
researchers and interacting with them. Although the orangutan tested by 
Wright (Abang) was not specifically nor intentionally trained to commu- 
nicate with humans or copy them, he was often in very close contact with 
humans as reported by a keeper of the ape team at the time of the experi- 
ments conducted by Wright (which can provide unintentional training 
across domains). Some of the chimpanzees we tested here were reported 
to be hand-reared and other subjects (including gorillas) had unknown 
backgrounds. However, it is unclear which degree of enculturation those 
apes would have as it is not described what hand-rearing involved or dur- 
ing how long (if at all) the hand-reared chimpanzees (and perhaps some 
gorillas) were separated from their group and in close, direct contact with 
humans. In any case, the chimpanzees included in our study would have 
only lived in close proximity to humans during their infancy as they were 
introduced into the conspecific group when they were juveniles. Conse- 
quently, their degree of enculturation would be lower (semi-enculturated) 
than that of Abang or Kanzi (enculturated), who lived in human-cultural 
environments even in adulthood. This difference in enculturation degree 
could partly explain the different results obtained between studies, sug- 
gesting that an elevated level of enculturation (perhaps including exten- 
sive human exposure) is necessary for apes to acquire sharp stone tool- 
making and use — either on their own or triggered/copied from human 
demonstrations (compare Motes-Rodrigo et al. 2022). 

The results presented here also contrast with one of our previous 
studies testing the social learning abilities of three stone-tool naive, 
untrained and unenculturated orangutans to learn from human demon- 
strations how to make and use sharp-stone tools (Motes-Rodrigo et al. 
2022). In this previous experiment, we employed a similar testing set-up 
and equivalent materials as those employed in the present study. In the 
orangutan experiment, one young female (O1) engaged in frequent per- 
cussive behavior (N=19 events) using a hand-held hammer to strike a 
platform holding the hammers, the walls of the enclosure and the fixed 
core. In these core-directed strikes, and contrary to the chimpanzees we 
tested here, the target of percussion was the exposed area of the core (i.e., 
the only region from which sharp stones could have been detached). In 
addition, percussive actions of the orangutan using the hammers to strike 
against the walls and holding platforms led to the detachment of several 
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pieces from the hammers, some of which were sharp enough to be used 
as cutting tools. Nevertheless, the orangutan did not use these pieces as 
tools. Previous experiments included in that same study on a different 
orangutan population showed that orangutans spontaneously engage in 
lithic percussion during baselines without the need to observe human 
demonstrations of percussion. These spontaneous percussive events 
involved the use of both hammers and the core as active elements to 
strike the walls and floor of the testing quarter. In one trial, the core was 
used as an active element when an individual removed it from its fixing 
platform. This passive hammer technique! led to the detachment of three 
sharp stone fragments. No observations were made of orangutans sponta- 
neously using the hammers to strike the core in the absence of human 
demonstrations. Given this spontaneous, untrained expression of lithic 
percussion by naive orangutan individuals (see also examples in other 
orangutan populations Bandini, Grossmann, et al. 2021; and capuchins 
Westergaard and Suomi 1994) it is likely that O1 did not acquire the 
know-how involved in lithic percussion from the demonstrations but 
instead socially gathered and used information regarding the target 
(know-where/know-what) of percussion. 

The results of the present and previous studies on ape knapping point 
to several factors influencing the acquisition process of sharp stone-tool 
production by apes. Here, stone tool-making or production should be 
understood as the general ability to detach sharp pieces from a core or 
from a (human-intended) hammerstone rather than the production of 
sharp artifacts by specifically striking a platform-oriented at an acute 
angle relative to a high mass zone (acute angle rule, Moore 2020), for 
which there is no evidence in apes. A combination of enculturation, train- 
ing/molding and the provision of human demonstrations might allow 
some apes (at least one bonobo and one orangutan) to reliably acquire 
sharp stone tool-making and -using abilities during individual tests over 
extended periods of time. This combination of subject characteristics and 
testing methodology seems to be required for apes to learn these skills 
considering the results presented here. When hand-reared chimpanzees 
(in conspecific groups with unenculturated individuals) were provided 
with human demonstrations, the observed stone-related behaviors were 
much rarer and limited to only certain actions (i.e., occasional percus- 
sion) that are not outside the spontaneous abilities of naive individuals 
(Bandini, Grossmann, et al. 2021; Motes-Rodrigo et al. 2022). In orangu- 
tans, it seems that human demonstrations (provided in an individual set- 
ting) are not sufficient for the acquisition of the complete sequence of 
stone tool-making and -use when the individuals are not enculturated 
(Motes-Rodrigo et al. 2022). Thus, one potential explanation for the dif- 
ferent results obtained in the present compared to previous studies with 


' As suggested by one of the reviewers of this manuscript, the passive hammer 


technique could also be referred to as active core technique. 
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positive results (Toth et al. 1993; Wright 1972) is the specific degree of 
enculturation. Unenculturated apes can sometimes show some elements 
of the knapping sequence and even sharp stone tool use (Motes-Rodrigo 
et al. 2022) but these skills are shown rarely and unreliably, and so far by 
single individuals. 

Overall, ape knapping experiments suggest that elevated levels of 
enculturation—perhaps in addition to human demonstrations? and/or 
training— are required for apes to reliably express stone tool-making 
skills. In addition, modern task-naive, unenculturated and untrained apes 
cannot individually or socially learn the complete sequence of behaviors 
involved in knapping, including the use of the resulting sharp stones. 
However, some of these experiments have revealed that at least some 
species of apes can spontaneously acquire some elements of this 
sequence (lithic percussion and the use of a sharp stone as a cutting tool, 
separately, Motes-Rodrigo et al. 2022). These results open the possibility 
that our hominid ancestors possessed the anatomical and cognitive abil- 
ities to engage in and learn certain behavioral prerequisites of lithic tech- 
nologies in the absence of know-how copying. For example, hominids 
could have unintentionally produced sharp-edged stones as a by-product 
of percussive activities (e.g., during foraging, play or another activity, as 
it occurs in other species; Proffitt et al. 2016). In addition, it is possible 
that they could also have spontaneously used flakes available from 
anthropogenic (unintentional sharp-edged stones produced as a by-pro- 
duct of percussive activities; perhaps even by other species) and/or flakes 
and/or unmodified stones from non-anthropogenic origins (e.g., cliff 
falls, Barnes 1939) as cutting tools (e.g., Mountford 1941). Later homi- 
nins could have then combined these abilities leading to the emergence of 
intentional knapping. 

Despite these and other advances in non-human and human primate 
archaeology, the emergence process of the complete hominin knapping 
sequence, including intentional sharp-edged stone production and use, 
remains speculative. Similarly, it is unclear whether intentional knapping 
evolved as a response to necessity, opportunity, relative profitability or a 
combination of these (Koops et al. 2014; Rutz and St Clair 2012). Knap- 
ping skills could have emerged in our lineage when the degree of terres- 
triality increased and the foraging niche expanded (Koops et al. 2014), 
including the exploitation of larger prey and processing of hard under- 
ground storage organs of plants (Marchant and McGrew 2005), which 
could have increased the need or relative profitability of sharp stone tools 
(Shea 2011). We are hopeful that future investigations will better be able 
to determine both why and when these skills emerged in our ancestors. 
However, even if we find the answers to these questions, the puzzle will 
remain of when and why the acquisition of sharp stone-tool production 


2 


? Please note that baseline tests (in the absence of demonstrations) with highly 
unculturated apes are still pending. 
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and use started to depend on know-how copying skills (Bandini, Motes- 
Rodrigo, et al. 2021; Motes-Rodrigo et al. 2022; Snyder et al. 2022; 
Tennie et al. 2016, 2017). 
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CHAPTER SIX 


A comparative multimodal perspective on the evol- 
utionary origins of tool use and handedness 


Ammie K. Kalan! 


Abstract 


Laterality and the evolution of handedness have been of significant scholarly 
investigation across a wide variety of disciplines, including animal behaviour, 
neurobiology, developmental psychology, archaeology, and language evolution. 
Despite the longstanding array of attention, there remains no clear consensus on 
how and why laterality, and by extension handedness, evolved. Here | review 
comparative research on handedness in nonhuman primates to draw attention to 
the leading theories in the evolution of laterality as they relate to tool use and lan- 
guage origins. In doing so, | aim to provide an overview of our current understand- 
ing of the factors influencing handedness and the potential insight further study of 
nonhuman primates, particularly wild great apes, could contribute to ongoing dis- 
cussions. Moreover, drawing on recent studies in both human knapping and 
chimpanzee stone tool use behaviour, | advocate for a multimodal approach to 
investigations of handedness, one where sound is integrated into existing para- 
digms examining laterality in tool use behaviour. Such a perspective has the 
potential to reveal novel insights into the auditory information that may have aided 
our hominin ancestors at the advent of their lithic technical revolution. 


INTRODUCTION 


Contemporary human populations exhibit a population-level right- 
handed bias for manual actions. Although there is some variability across 
cultures and the type of action (for excellent recent reviews see 
McManus 2019; Prieur et al. 2019), the overwhelming predominance of 
right-handedness among humans remains a complex and multifaceted 
trait with significant implications for the evolution of brain lateralization, 
particularly with regards to left hemispheric specialization for language 
(Stout et al. 2008). Both the language production and perception centers 
of the brain are predominantly found in the left hemisphere of the human 


! Department of Anthropology, University of Victoria, Victoria BC, Canada. 


© 2023, Kerns Verlag / https://doi.org/10.51315/9783935751384.006 

Cite this article: Kalan, A. K.. 2023. A comparative multimodal perspective on the evolution- 
ary origins of tool use and handedness, ed. by F. A. Karakostis, G. Jager, pp. 123-142. 
Tübingen: Kerns Verlag. ISBN: 978-3-935751-38-4. 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


123 


124 


Kalan 


brain although, again, there is variability (Fitch and Braccini 2013; 
McManus 2019; Prieur et al. 2019; Stout and Chaminade 2012). Since 
brain imaging techniques, including recent scanning advancements, are 
considerably more invasive than observations of hand-use, handedness 
remains one of the principal means by which researchers can have easy 
access to investigations of laterality. It is therefore unsurprising that 
much of the handedness research on nonhumans has been in mice where 
invasive brain imaging techniques are often applied (McManus 2019; 
Warren 1980). However, there similarly exists a large body of literature 
on handedness in nonhuman primates, albeit invasive techniques are 
more difficult to apply to primates due to increased ethical concerns, par- 
ticularly with regards to wild populations. 

The cumulative research on laterality and handedness suggests that 
genetic, environmental, as well as social and cultural influences likely 
contribute to the development and persistence of right-handedness 
among humans (McManus 2019). However, there is still a lack of clear 
consensus on the relative importance of these factors and exactly how 
they might interact with one another (McManus 2019; Prieur et al. 2019). 
Therefore, a comparative approach permits the investigation of the evo- 
lutionary selection pressures that may have helped shape laterality and 
population-level handedness. More specifically, ancestral and living 
hominids can provide insight into the original contexts and benefits asso- 
ciated with population-level handedness and how this came to be so 
tightly linked with hemispheric laterality (Cashmore, Uomini, and Chap- 
elain 2008). However, as noted by previous scholars, comparisons across 
and within species are often hindered by inconsistent application of 
methods and terminology (e.g., Cashmore, Uomini, and Chapelain 2008; 
McGrew and Marchant 1997). 

In the present chapter, I aim to provide a brief overview of compara- 
tive research on handedness of nonhuman primates, with a central focus 
on great apes, and how they fare in comparison to contemporary humans. 
In doing so my intention is twofold: 1) highlight the gaps in our knowl- 
edge with regards to the evolution of population-level handedness, and 2) 
present opportunities for future research directions by proposing an inte- 
grated multimodal framework for investigating handedness (and by 
extension laterality) in the tool-use behaviour of humans and nonhumans 
alike. Importantly, the term laterality can refer to the dominance of one 
side of the body or brain but in the current chapter, I am primarily con- 
cerned with motor lateralization and refer to brain lateralization 
explicitly in the text where relevant. With regards to motor lateralization, 
I clearly differentiate whether I am referring to individual-level or pop- 
ulation-level laterality in hand use (1.e., handedness). I also use the term 
hand preference to specify intra-individual hand-use patterns which is 
altogether quite different from a population-level bias (Cashmore, 
Uomini, and Chapelain 2008; McGrew and Marchant 1992). Population- 
level handedness is where the same hand preference is consistently 
observed across individuals within a specified population or group. 
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THE EVOLUTION OF HANDEDNESS 


Many primatologists have been captivated with the concept of popula- 
tion-level handedness given the obvious insights it might provide regard- 
ing the evolution of universal right-handedness amongst humans 
(Cashmore, Uomini, and Chapelain 2008; Fitch and Braccini 2013; 
Hopkins 1996). McGrew and Marchant (1997) provided a thorough 
review of primate laterality studies on both strepsirrhines and hap- 
lorrhines which clearly demonstrated that the majority of studies avail- 
able at that time suffered from methodological and/or statistical 
constraints, thereby prohibiting robust species-wide inferences. In the 
studies they were able to assess, McGrew and Marchant (1997) over- 
whelmingly found little evidence for population-level lateralization in 
nonhuman primates although chimpanzees showed some tendency for a 
right-handed population-level bias in captivity. The lack of standardized 
methods and analyses, as well as the fact that the majority of handedness 
research involved captive populations, clearly contributed to the incon- 
sistent and variable findings of laterality and handedness. A more recent 
review by Fitch and Braccini (2013) on the evolution of handedness, still 
echoes the early concerns highlighted by McGrew and Marchant (1997) 
regarding small sample sizes and weak statistical effects, to conclude that 
there is only tenuous evidence for laterality and handedness in nonhuman 
primates. Therefore, handedness continues to draw great research interest 
due to the often contradictory and inconsistent results of previous studies 
(Fitch and Braccini 2013). 

One of the leading hypotheses for the evolution of motor lateraliza- 
tion is the “postural origins” hypothesis whereby the right hand is postu- 
lated to have been the dominant grasping hand during locomotory and 
positional behaviours of an ancestral arboreal primate. With increasing 
terrestriality in the evolution of primates, the right hand would have been 
subsequently co-opted as the flexible gripping hand, where bimanual 
actions and object manipulation would have become more frequent 
(MacNeilage, Studdert-Kennedy, and Lindblom 1987). The hypothesis 
might then predict that left-handed bias is the ancestral condition (more 
likely to be found in the basal strepsirrhines) and right-handed bias the 
derived condition yet recent studies, including phylogenetic comparisons 
of a large number of primate species, do not support this (Caspar et al. 
2021). However, a recent cross-species comparison did find that prima- 
rily arboreal primate species tend to have a left-hand population-level 
bias whilst species that are primarily terrestrial (1.e., the African apes, 
baboons, macaques, as well as humans) tend to have a right-hand popula- 
tion-level bias, therefore, a species’ ecology was important to consider 
(Meguerditchian, Vauclair, and Hopkins 2013). Overall then, terrestrial- 
ity is likely to have played a significant role in the evolution of right- 
handed population bias in primates. 

Relevant to discussions on the adaptive utility of laterality is the ‘task 
complexity’ hypothesis originally developed by Fagot and Vauclair 
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(1991). The authors distinguished between low and high-level tasks 
whereby the latter were more cognitively taxing and involving precise 
and/or novel movements and are therefore more likely to be associated 
with hand preferences and ultimately, population-level handedness 
(Fagot and Vauclair 1991; but see McGrew and Marchant 1997 for rea- 
sons why the novel aspect can be difficult to operationalize). In contrast, 
low-level tasks were considered to be more routine, familiar tasks (e.g., 
reaching for food). Task complexity explained some, but not all, of the 
variation in results across the primate studies reviewed by McGrew and 
Marchant (1997). More recently, however, task complexity has gained 
support in a number of primate studies comparing unimanual versus 
bimanual actions, where only the latter demonstrates consistent lateral- 
ization in hand-use (reviewed in Meguerditchian, Vauclair, and Hopkins 
2013). 

Additional theories for the evolution of laterality and handedness 
stress the significance of social interaction in shaping patterns of hand- 
use (Ghirlanda, Frasnelli, and Vallortigara 2009), and the evolution of 
bipedalism (Westergaard, Kuhn, and Suomi 1998) or tool use (Stout and 
Chaminade 2012). In particular, theories involving tool use integrate 
aspects central to the postural (e.g., freeing up the hands) and task com- 
plexity hypotheses (e.g., complex object manipulation) for the emer- 
gence and adaptive significance of handedness. I will therefore primarily 
focus on tool-use-related studies for the remainder of this chapter. 


POPULATION-LEVEL HANDEDNESS IN CAPTIVE NONHUMAN PRI- 
MATES 


Since McGrew and Marchant’s (1997) formidable review, research on 
nonhuman handedness has attempted to address many of the concerns 
plaguing the earlier studies they examined by using larger sample sizes, 
variable types of tasks, studies done on wild populations (see next sec- 
tion), and by applying more stringent and comparable methods for anal- 
yses of data. For example, using a large sample size of chimpanzees, 
Hopkins and colleagues have consistently found that captive chimpan- 
zees show a population-level right-hand bias for unimanual (e.g., reach- 
ing for food) and bimanual coordination tasks (e.g., tool use or solving 
the ‘tube task’ described below), as well as for communicative, manual 
gestures (Hopkins, Mareno, and Schapiro 2019; Hopkins, Russell, 
Freeman, et al. 2005; Hopkins, Russell, Hook, et al. 2005). However, 
many studies fail to find population-level biases in ape hand-use, includ- 
ing chimpanzees, even when employing similar methods (Brand et al. 
2017; Lambert 2012; Motes-Rodrigo, Hernandez-Aguilar, and Laska 
2019; Prieur et al. 2018) 

The significance of the corpus of captive research on primate handed- 
ness lies in its rich set of cross-species experiments whereby a standard- 
ized set of tasks and methods permits robust inferences. For example, a 
comparison of 777 captive great apes, including gorillas, bonobos, chim- 
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panzees and orangutans found population-level right-hand bias in all 
apes for a bimanual coordinated task, excluding the orangutan (Hopkins 
et al. 2011). Instead, orangutans showed a left-hand population bias, con- 
sistent with at least one previous study (Brésard and Bresson 1983) but 
not a more recent study where a population-level bias in orangutans was 
not found (O’Malley and McGrew 2006). The strength of these captive 
experiments, however, is in testing the nonhuman primates with the same 
‘tube task’, facilitating cross-species comparisons. Originally designed 
by Hopkins (1995) as a bimanual coordinated task, a tube containing 
food is provided where one hand grasps or stabilizes the tube while the 
other extracts the food item, hence necessitating the use of both hands. 
Longitudinal studies of captive populations are also noteworthy for dem- 
onstrating some degree of heritability of handedness in chimpanzees, as 
well as performance asymmetries, whereby right-handed individuals 
were more efficient in solving a tool-use task with their right hand than 
their left (Hopkins, Mareno, and Schapiro 2019). Handedness heritability 
in primates has also been suggested for the unimanual reaching actions of 
a troop of captive Japanese macaques that showed a left-handed popula- 
tion bias (Kubota 1990). However, these studies are generally limited in 
sample size compared to the strong evidence for heritability in human 
handedness (McManus 2019; Warren 1980). 

The majority of captive studies on ape handedness have been over- 
whelmingly focused on bimanual coordination and tool use tasks and did 
not consider the potential role or influence of communication on hand 
preference. More recently, however, researchers have investigated later- 
ality for both communicative and non-communicative actions. For exam- 
ple, investigations of handedness for communicative gestures have 
shown a population-level right-hand bias in baboons, gorillas, chimpan- 
zees and bonobos (reviewed in Meguerditchian, Vauclair, and Hopkins 
2013). Other studies report right-hand population-level bias for gestures 
involving tool use only (Prieur et al. 2018) and another study investigat- 
ing the gestural repertoire of wild chimpanzees found no population- 
level bias but an individual-level right-hand bias for gestures involving 
object manipulation (Hobaiter and Byrne 2013). Therefore, the inclusion 
of an object into the communicative action may be a significant factor in 
driving lateralization. 

Recent studies analyzing gestural laterality in-depth have provided 
further insight into mediating factors, such as features of the target of the 
actions being done. More specifically, whether actions are directed at an 
inanimate (e.g., an object), rather than an animate target (e.g., a conspe- 
cific or human), has been shown to influence hand preference. For exam- 
ple, one study found that young children demonstrate a right-hand bias 
only towards actions directed at inanimate rather than animate targets 
(Forrester et al. 2013). It is argued by Forrester and colleagues (2013) 
that this is in line with what is reported for tool-use tasks in wild chim- 
panzees where a right-hand population-level bias has been shown for 
some tool-use tasks (but see next section). However, animate targets are 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


127 


128 


Kalan 


likely to be highly relevant for communicative gestures, where a right- 
hand population bias has been observed in multiple species. Therefore, 
this target distinction cannot account for all the variation we observe in 
primate handedness, particularly if we include human infants and tod- 
dlers who have also shown a right-hand population-level bias for ges- 
tures in other studies (Cochet and Vauclair 2010). Additionally, for com- 
municative gestures, whether the conspecific is in the left or right visual 
field of the signaller can further influence hand preference (Prieur et al. 
2017). Overall then, despite the decades of research on primate handed- 
ness, we still lack a consensus regarding if, and to what degree, species 
exhibit population-level handedness, and if yes, how specialized this is 
for a particular task or type of action, whether communicative or not. 

Generally speaking, captive studies are powerful due to their clear 
experimental design and standardized tasks, but they also lack natural 
socio-ecological contexts to provide insight into the evolutionary drivers 
of handedness (McGrew and Marchant 1997). These studies have been 
further criticized for the influence of captive rearing environments and 
the effects of learning (McGrew and Marchant 1997; Palmer 2003; 
Warren 1980; see also Hopkins and Cantalupo 2005). However, these 
latter concerns seem generally unfounded given that nonhuman primates 
in the wild have also exhibited population-level biases suggesting hand- 
edness is not unique to captive settings. For example, some studies on 
wild chimpanzees report population-level right-handedness for certain 
tool-use tasks similar to captive studies (e.g., Lonsdorf and Hopkins 
2005) although there is considerable variability in these findings (see 
next section). It is therefore prudent to find parsimony between captive 
and wild studies with regards to handedness when possible. 


A WILD PERSPECTIVE ON HANDEDNESS: WHY IT MATTERS 


While research on captive primates provides much-needed experimental 
control, research in the field gives us the behavioural contexts in which 
primates naturally exhibit handedness and have evolved to do so. Of par- 
ticular relevance here, are the variety of tool-use behaviours wild pri- 
mates show in foraging but also socio-communicative contexts. Much of 
the tool-use repertoire of wild primates has been shown to be socially 
influenced, and even cultural (Sanz, Call, and Boesch 2013), therefore 
there are stark ontogenetic differences in behaviours that are part and par- 
cel of a species evolved repertoire in comparison to artificial tasks that 
they might be trained to do in captivity. We know for humans how devel- 
opmental and environmental factors can be critical to establishing pop- 
ulation-level handedness, including social and cultural influences, even 
from a young age (McManus 2019). Hence, data on wild primate popula- 
tions provides a critical and necessary perspective for our understanding 
of the origins and development of population-level handedness within 
the Primate order. It is also worth noting that the majority of handedness 
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studies in wild nonhuman primates are on chimpanzees; therefore, much 
of the rest of this chapter focusses on chimpanzees. 

Some of the first studies on laterality in wild primates were conducted 
via behavioural observations of wild chimpanzees using tools. Here we 
find an often convoluted stream of research findings where for the same 
population of wild chimpanzees early studies may have failed to find 
population-level handedness but later studies, with the inclusion of more 
behaviours and/or more individuals, would confirm population-level 
handedness (e.g., the chimpanzee nut cracking studies described below). 
Importantly, hand preference, whereby individuals demonstrate the con- 
sistent use of either the left or right hand for a task, has been consistently 
observed in chimpanzees in both captive and wild contexts from even the 
earliest investigations of handedness (e.g., Finch 1941; Boesch 1991). 
However, individuals may differ in their preferred hand across tasks, and 
most importantly, results for population-level handedness have been 
inconsistent as described in the following section. 

The Bossou chimpanzees of Guinea and the Mahale chimpanzees of 
Tanzania have arguably provided the most in-depth studies in wild chim- 
panzee handedness, given the number of behaviours investigated and the 
longitudinal nature of their datasets. Other populations, such as Tai, 
Gombe and Goualougo, have also contributed significant works on hand- 
edness in wild chimpanzees, especially for tool use behaviours. Overall, 
these studies provide mixed results. With regards to nut-cracking beha- 
viour, whereby chimpanzees use wooden or stone tools to crack open 
various species of nuts in the wild (Boesch and Boesch 1981), we have 
inconsistent findings on handedness. Studies on the Bossou chimpanzees 
where they crack open nuts with not only hammer tools but also employ 
moveable anvils have found individual-level hand preferences for hold- 
ing the hammer but no significant population-level bias for the left or 
right hand (Humle and Matsuzawa 2009; Sugiyama et al. 1993). Simi- 
larly, there was no population-level hand bias found for nut cracking in 
the Tai chimpanzees of Côte d’Ivoire (Fig. 1) (Boesch 1991). However, a 
later study conducted by Lonsdorf and Hopkins (2005) where nut-crack- 
ing data was pooled from two populations (Tai: Boesch 1991; Bossou: 
Biro et al. 2003) found support for a population-level right-hand bias, 
implying that a larger sample size of individuals was necessary for 
detecting a population-level bias. However, it is then debatable to what 
degree combining two distinct chimpanzee communities represents a nat- 
urally occurring population-level phenomenon. Notably, this study also 
found evidence for heritability in hand preferences when examining 
mother-offspring hand use (Lonsdorf and Hopkins 2005). 

Besides nut cracking, termite fishing is perhaps the best-studied tool 
use behaviour in wild chimpanzees. Termite fishing is a behaviour that 
exhibits clear cultural variation among different populations but can be 
generally described as chimpanzees using herbaceous vegetation or stick 
tools by inserting them into termite nests to collect the insects for eating 
(Boesch et al. 2020). Lonsdorf and Hopkins (2005) reported a left- 
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Fig. 1. 

An adult female Tai chimpan- 
zee nut cracking with a stone 
hammer as her daughter 
watches. Credit: Liran Samuni/ 
Tai Chimpanzee Project. 


handed population bias for termite fishing in the Gombe chimpanzees, 
for which McGrew and Marchant (1992) had also documented a trend 
toward a left-hand bias at the population-level. Similarly, the Fongoli 
chimpanzees of Senegal show a trend toward population-level left hand- 
edness for termite fishing (Bogart et al. 2012). However, the Goualougo 
chimpanzees of Central Africa do not show a clear population-level hand 
bias despite strong hand preferences at the individual level, as well as 
documented performance asymmetries. For example, chimpanzees with 
a right-hand preference had a faster stick tool insertion time than those 
with a left-hand preference (Sanz, Morgan, and Hopkins 2016). McGrew 
and Marchant (1999) also reported that lateralized individuals were more 
efficient at gathering termites compared to individuals that did not show 
a clear hand preference. 

A number of other chimpanzee tool use behaviours have also been 
investigated with regards to handedness. Bossou chimpanzees have 
shown population-level right-hand bias for an extractive foraging beha- 
viour called ant dipping where herbaceous stalks or sticks are used as 
tools to collect army ants for eating (Humle and Matsuzawa 2009). No 
significant population-level bias was found for other tool use behaviours, 
namely algae scooping and pestle pounding (Humle and Matsuzawa 
2009). There was also no population-level hand bias found for aimed 
throwing (of various naturally occurring objects) using a long-term data- 
set of the Mahale chimpanzees despite some clear hand preferences at the 
individual level (Nishida, McGrew, and Marchant 2012). The Tai chim- 
panzees had clear individual-level hand preferences for wadge dipping, 
where individuals use one hand to repeatedly dip chewed-up wadges of a 
particular fruit into water, but again showed no population-level hand 
bias (Boesch 1991). Combined, these studies provide mixed support for 
the task complexity hypothesis given that nut cracking is arguably con- 
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sidered the most complex behaviour and clearly involves bimanual 
coordinated actions of all the tool use behaviours investigated in the wild. 

Research on the handedness of wild monkey species is relatively rare, 
despite baboons and macaques being highly terrestrial and generally easy 
to observe and identify. One notable study by Leca and colleagues (2010) 
on the stone handling behaviour of free-ranging Japanese macaques com- 
pared unimanual and bimanual actions during stone manipulation and 
found lateralization for the more complex bimanual actions only (note no 
population-level handedness was found when looking across multiple 
tasks). These results lend additional support to the task complexity 
hypothesis although additional studies of other wild populations are des- 
perately needed. Handedness research in the wild may be complicated by 
observational conditions but also the lack of individuals engaging with 
the exact same behaviour frequently enough to calculate robust handed- 
ness indices within a population, let alone across. Still, this is where a 
complementary perspective, combining both captive and field work, can 
be fruitful for constructing larger frameworks in which to investigate the 
evolution of handedness. 


TOOL USE, LANGUAGE & LATERALITY COME HAND IN HAND 


Functionally, it has been suggested that laterality likely conveyed an evo- 
lutionary benefit whereby hemispheric specialization via co-evolution 
mechanisms for tool use and language, facilitated by handedness and per- 
formance asymmetries, would permit faster and more efficient cognitive 
processing (Hopkins, Mareno, and Schapiro 2019; Prieur et al. 2019; 
Stout et al. 2008). Such an evolutionary pathway would have become 
particularly important for tasks involving complex bimanual actions and 
enhanced dexterity, especially for a bipedal primate (MacNeilage, 
Studdert-Kennedy, and Lindblom 1987; Meguerditchian, Vauclair, and 
Hopkins 2013; Prieur et al. 2019; Westergaard, Kuhn, and Suomi 1998). 
Tool use and tool manufacture (the latter for hominins only) have there- 
fore been of significant focus when it comes to theories of the evolution 
of handedness (Gabrić, Banda, and Karavani¢ 2018; Stout and 
Chaminade 2012). Generally speaking, current theories for how laterality 
and handedness may have evolved in our species coincide on themes of 
a tool-using biped whose social and ecological environment would have 
favoured intentional communication with conspecifics. The strongest 
evidence for this evolutionary scenario comes from neuroanatomical 
studies which show similar brain asymmetries for hand motor tasks (e.g., 
tool use) and communicative actions, such as gestures (for detailed 
reviews see Hopkins, Mareno, and Schapiro 2019; McManus 2019; 
Prieur et al. 2019; Stout and Chaminade 2012). Such an evolutionary sce- 
nario therefore links the emergence of left hemispheric specialization in 
the primate lineage with the emergence of bipedalism and tool use, as 
primers for language to evolve by co-opting the pre-existing neuro-archi- 
tecture of a left hemisphere specialization. Details and the order of emer- 
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gence are still up for debate, but many (though definitely not all) scholars 
can agree that these traits appear to be highly interconnected and linked 
in their neurobiological origins and development. 

Unlike tool use and bipedalism, language does not leave behind 
archaeological evidence that can be examined and dated to reconstruct 
evolutionary histories. It is therefore not a coincidence that the origins of 
language have perplexed scholars for centuries, and continue to do so 
(Corballis 2017; Fitch 2010). The search for the origins of language 
among nonhuman primates has generally favoured a gestural origin 
given that gestures, particularly well studied in the great apes, have con- 
sistently demonstrated clear communicative intent, greater control, 
increased flexibility and innovation relative to a limited vocal repertoire 
(Arbib, Liebal, and Pika 2008; Call and Tomasello 2007; Christiansen 
and Kirby 2003; Fröhlich et al. 2019; Liebal et al. 2013; Pika et al. 
2005). Yet others would argue that the framework used to evaluate ges- 
tures has typically not been applied to vocalizations therefore the bias 
towards gestures is partly a result of methodological discrepancies 
(Townsend et al. 2017; but see also Fischer 2017). However, the gestural 
theory for language origins also has strong support from disciplines other 
than primatology, namely neuroscience. The discovery of mirror neurons 
in the F5 brain region of monkeys, a homologue of Broca’s area, one of 
the critical language centers in the human brain, has lent strong support 
for gestural motor theories of language origins (Rizzolatti and Arbib 
1998). As mentioned previously, this research supports that the neurobio- 
logical substrates responsible for motor coordination were co-opted for 
the evolution of language within the primate lineage (Arbib, Liebal, and 
Pika 2008; Corballis 2003; Kohler et al. 2002; Rizzolatti and Arbib 
1998). Importantly, mirror neurons are activated not just when an indi- 
vidual performs an action, but also when watching others perform that 
action (Arbib 2005; Kohler et al. 2002). This suggests mirror neurons 
may facilitate social learning mechanisms such as imitation, a cognitive 
skill argued to be necessary for developing uniquely human traits such as 
language (Arbib 2005; Corballis 2017) and cumulative culture (Henrich 
and Tennie 2017). Importantly, the gestural theory of language evolution 
has some overlapping neurobiological support with the technological or 
tool-use hypothesis for language evolution, although slight differences 
are also recognized (for a detailed review see Stout and Chaminade 
2012). 

More recently, a multimodal origin for language evolution has 
become increasingly supported in the literature, again primarily via 
detailed studies of great ape communication, given that auditory, orofa- 
cial and gestural signals are often combined with one another by both 
humans and nonhuman primates (Arbib, Liebal, and Pika 2008; Frohlich 
et al. 2019; Liebal et al. 2013; Taglialatela et al. 2015). Studies on multi- 
modal communication, particularly in chimpanzees, have revealed sig- 
nificant associations with both motor coordination and sound processing. 
For example, a study on captive chimpanzees found that individuals who 
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combined vocalizations and orofacial movements to produce attention- 
getting sounds had a higher deposition of gray matter in brain regions 
associated with motor control, compared to individuals that did not pro- 
duce these signals (Bianchi et al. 2016). Similarly, other researchers have 
proposed that oropharyngeal motor coordination, such as lip-smacking in 
primates, creates rhythmic sound utterances that may have provided the 
ancestral basis for human speech (Bergman 2013; Ghazanfar and Taka- 
hashi 2014). These slight movements of the mouth, tongue and larynx are 
thereby argued to have facilitated the transition in language evolution 
from a primarily gestural mode to the acoustic channel (Corballis 2017). 
Despite the clear relevance to theories of language evolution, multimodal 
investigations of primate communication remain relatively understudied 
with researchers usually focussing on a single modality (Slocombe, 
Waller, and Liebal 2011). 

Similarly understudied yet relevant to the multimodal origins of lan- 
guage evolution are the incidental sounds produced by manual actions 
that can potentially serve as sources of information to both producers and 
listeners. Previous scholars have remarked that incidental sounds, pro- 
duced by chewing or locomotion (Larsson 2014; MacNeilage 1998), or 
by the manual acts of using tools (Larsson 2015), all rely on motor 
coordination and likely stimulate mirror neurons. Moreover, mirror neu- 
rons can be activated by simply hearing these incidental sounds alone 
(Kohler et al. 2002). In sum, there is some evidence to suggest that tool- 
use sounds which necessitate manual dexterity are linked to the neuro- 
biological substrates of language perception and production in humans 
(Arbib 2005; Bianchi et al. 2016; Corballis 2003; Larsson 2015; Rizzo- 
latti and Arbib 1998). However, other neurobiological studies do not 
show an overlap between motor control brain regions involved in object 
manipulation with language area homologs (Becker et al. 2022; Fitch and 
Braccini 2013). Nevertheless, the incidental sounds and the role of audi- 
tory (non-vocal) signals have generally not been factored into these 
studies and are usually ignored for their relevance to both tool use and 
communication. I suggest that these auditory signals need to be 
examined in greater detail, particularly for their ability to provide critical 
information not only to the tool user or producer, but also to any 
bystander or listener. In this sense, they may play a role in the learning 
and transmission of manual actions such as tool use or manufacture, and 
importantly, provide a pathway that intimately connects auditory chan- 
nels of perception and processing with hand-eye motor coordination and 
left-hemispheric specialization, potentially significant for language 
origin theories. 


THE (NEGLECTED) ROLE OF IMPACT SOUNDS 
Auditory, non-vocal signals, have been of little interest to most primatol- 


ogists concerned with communication other than recognizing that pri- 
mates often produce displays whereby objects may be incorporated to 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


133 


134 


Kalan 


some extent to produce sound. Lameira and colleagues (2012) describe 
an instrumental gesture-call in orangutans, as a call that can “modify oro- 
laryngeal acoustic production, with or without tools” which, while tech- 
nically non-vocal is still produced by the animal. Similarly, chimpanzee 
buttress drumming (Arcadi, Robert, and Boesch 1998) or leaf clipping 
(Kalan and Boesch 2018), are sounds that are produced by the animals 
interacting with an object in their environment and are therefore also 
non-vocal. Although interesting behaviours for animal communication, 
in the present context I am particularly concerned with sounds produced 
by nonhuman primates as they interact with stone tools, given the 
obvious parallels and implications for hominin lithic technology and its 
evolution. 

The sound produced when a stone makes contact with another object 
can be referred to as an impact sound. Impact sounds are significant in 
that their acoustic properties will be dependent upon, and characteristic 
of, both the material properties of the impactor and the object being 
impacted (Kalan et al. 2019). We are surrounded by impact sounds in our 
daily lives, from the musical instruments we hear or play to the mechan- 
ical noise of machines and vehicles. Although archaeologists have long 
remarked on the potential importance of impact sounds while making 
stone tools (1.e., flint knapping), the sounds themselves have rarely been 
featured in thorough investigations until recently. Researchers have sug- 
gested that the knapping sound may provide information regarding the 
accuracy of the strike or quality of the knapping material (cf. Patten 2009 
in Smith et al. 2021). In a one-of-a-kind study, a controlled hammer 
machine was used to impact various lithic materials and demonstrated 
significant differences in sound duration, pitch and loudness (DeForest 
and Lyman 2022), suggesting ancient knappers would have been able to 
use impact sounds to assess lithic quality. Additionally, knapping sounds 
may offer information such as the size of the flake, or the level of expert- 
ise of the knapper (Smith et al. 2021). In their recent study, Smith and 
colleagues (2021) trained musicians to process the sounds produced by 
human flint-knappers and found significant differences in pitch and 
octaves produced due to raw material and knapper skill-level, but no dif- 
ferences in sound could be attributed to flake size (Smith et al. 2021). Of 
course, we cannot directly observe the ancestral hominins who first start- 
ed to make and use stone tools and instead must rely on staged, experi- 
mental settings with contemporary humans. Here, a comparative 
approach can once again be of value given that a number of nonhuman 
primate species use stone tools naturally in the wild, including one of our 
closest living relatives, the chimpanzee (for knapping experiments done 
with nonhuman great apes see Motes-Rodrigo and Tennie chapter in this 
book). 

Although chimpanzees do not make stone tools, they do use stone 
tools, for both cracking open nuts (Boesch and Boesch 1981) and for a 
unique communicative behaviour where they repeatedly throw stones at 
particular trees, referred to as accumulative stone throwing (AST) (Kühl 
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et al. 2016). I recently investigated the impact sounds produced by AST, 
namely the sounds of stones hitting the trees being used as AST sites by 
the chimpanzees. I found that the tree species used for AST had a res- 
onant timbre (i.e., a longer lasting, lower frequency sound), in compari- 
son to other trees widely available but not used for this behaviour (Kalan 
et al. 2019). Were the chimpanzees drawn back to these particular AST 
trees just because they found the sounds they produced to be pleasant to 
the ear? Unfortunately, we do not know and this remains to be investi- 
gated. Similarly, this research has reignited my interest in the sounds pro- 
duced by nut-cracking chimpanzees that I personally observed in the Tai 
forest of Côte d’Ivoire (Fig. 2). Are chimpanzees paying attention to nut- 
cracking impact sounds? If so, what kind of information are they extrac- 
ting and potentially using? One way in which we might begin to address 
these questions is by incorporating the sounds produced during stone tool 
use into the robust frameworks of handedness and laterality research 
described in this chapter. Specifically, we can make predictions regarding 
the degree of laterality expected if sounds are produced. For example, we 
may predict that tool use behaviours that produce impact sounds may be 
more likely to activate a left hemispheric specialization (i.e., ‘language 
areas of the brain’, Stout and Chaminade 2012), and therefore present a 
greater likelihood to observe population-level right-handedness. Most 
importantly, such a prediction allows us to compare patterns in both 
humans and nonhuman primates using the same framework, given that 
instructions in the form of speech or gestures would not be necessary. 
Impact sounds themselves can be acoustically described via spectral and 
temporal properties of the sound signal (Kalan et al. 2019), and acoustic 
patterns can be characterized in relation to the fine-grained manual 
actions of the tool use behaviour. Sounds themselves could also be mod- 


Fig. 2. 

Spectrogram of the impact 
sounds produced by a nut 
cracking chimpanzee in the Tai 
forest of Côte d'Ivoire. The 
audio signal has been 
extracted from a remote cam- 
era trap (Bushnell Trophy Cam) 
video which has a sampling 
rate of 11kHz and 24 bits/s. 
Credit: MPI-EVA/PanAf/TCP. 
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Fig. 3. 

Diagram illustrating multisen- 
sory components relevant to 
tool use and the evolution of 
handedness, particularly the 
auditory information available 
via tool use impact sounds. 
Tool-use impact sounds are 
expected to be processed in 
the auditory cortex of the brain 
and integrated with visual and 
haptic sensory information, to 
subsequently guide variations 
in hand-use and specialization 
via adjustments in grip, force 
and accuracy for the task. 
Credit: chimpanzee drawing in 
the center by William B. 
Snyder. 
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ified or altered, either artificially or by introducing novel materials. 
Importantly, the auditory component of tool use should only be consid- 
ered alongside the significant haptic and visual-motor sensory compo- 
nents of tool use. For example, analyses of grip type and grasping style, 
along with the fine-grained hand movements required to achieve a par- 
ticular task, which are suggested to affect hand preference, specialization 
and handedness (Hopkins, Cantalupo, and Wesley 2002; Lambert 2012). 
For example, in humans, the dominant hand for right-handed individuals 
demonstrates greater grip strength (Incel et al. 2002; but see Bardo et al. 
2021 for a recent review on the topic). Trade-offs between power and 
precision grips are significant for the evolution of hominin technology 
given their influence over the control of manual actions involved in pro- 
cesses like tool manufacture and tool use (Karakostis et al. 2018). Tem- 
poral integration of grip variation alongside sounds could therefore pro- 
vide insight into potential mechanisms by which auditory signals medi- 
ate, or provide feedback, during processes such as knapping or other 
stone tool use behaviours in humans and nonhumans alike (Fig. 3). 


CONCLUSION 


Current research leaves us with mixed results and a lack of consensus on 
the origins of handedness, particularly given the growing number of 
intrinsic and extrinsic factors that appear to influence population- and 
individual-level handedness. Therefore, by adopting a broad, multimodal 
perspective we may gain new insights to help clarify the evolutionary 
relationships between manual dexterity for complex tasks, tool use, 
handedness and the emergence of language in humans. To achieve this, 
future comparative research will benefit from bridging the gap between 
communication research and tool use research when it comes to ques- 
tions regarding handedness and hand-use. In particular, a focus on the 
common auditory component to both communication and tool use could 
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provide a fruitful way forward. There is always the possibility that there 
is not much to be gained by listening to the impact sounds of tools (i.e., 
little to no information added beyond the visual and haptic senses). Yet 
given the ease with which observers of human knapping experiments, or 
observers of wild chimpanzee nut cracking, can attune their ears to the 
slight variations in the sounds produced when a stone strikes another 
object (Boesch and Boesch 1981), it is definitely an area of investigation 
that deserves more attention. Research on the evolution of handedness 
will further benefit from a greater focus on comparative studies conduct- 
ed using natural observations and behaviours, especially for wild nonhu- 
man primates, rather than relying primarily on captive experiments 
which currently dominate the literature on this topic. 
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CHAPTER SEVEN 


Dental fricatives: Patterning, evolution, and factors 
affecting a rare class of speech sounds 


Dan Dediu'?*, Jingmin Lin*, Scott R. Moisik’, 
Steven Moran?’ 


Abstract 


The (inter-)dental non-sibilant fricatives, consonants articulated with the tongue 
tip or blade against or between the front teeth, are rare among the world’s lan- 
guages but, nevertheless, are present in the sound inventories of some of the 
most spoken languages in existence. Here we try to shed light on the reason(s) for 
their distribution using multiple approaches, ranging from examining large cross- 
linguistic databases and phylogenetic reconstructions to the analysis of speech 
production data and anatomical measurements of rigid oral cavity structures 
obtained using intraoral 3D optical scanning. With these, we don’t only confirm 
that dental fricatives are rare among the present-day languages, but also that 
they have likely been so as far back as language families can be reconstructed, 
and that they are rarely borrowed between languages. The experimental data 
from L2 English speakers seem to suggest that details of the anatomy of the ante- 
rior vocal tract may play a role in the success of their acquisition. Therefore, den- 
tal fricatives are rare speech sounds for a multitude of reasons touching upon 
their articulation, acoustics, and confusability with other speech sounds, includ- 
ing the difficulty to produce in both L1 and L2 acquisition, the difficulty to perceive 
in L2 and in borrowing situations, and the rarity of attested sound changes pro- 
ducing them. Nevertheless, their frequent loss and merging with other phonemes 
in language change. Moreover, our data suggest that tiny, continuous, and over- 
lapping patterns of variation in the anatomy of the anterior oral vocal tract may 
help explain their instability and geographic patterning. 
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INTRODUCTION 


The (inter-)dental non-sibilant fricatives are consonant speech sounds 
articulated by placing the tongue tip or blade against (or between) the 
front teeth (Ladefoged and Maddieson 1996). They are rare cross-lin- 
guistically and are difficult to acquire for L1 and L2 learners alike. Here 
we try to shed some light on these intriguing speech sounds through the 
use of a multi-pronged approach that combines quantitative studies of 
their present-day cross-linguistic distribution and of the factors that 
affect their evolution (including sound change and language contact), 
with an experimental investigation into the factors that influence their 
successful production (or not) by second-language speakers of English. 
When we contextualize dental fricatives within the existing literature, it 
becomes clear that they are rare for multiple substantive reasons, and that 
their distribution and evolution is driven by linguistic, socio-demo- 
graphic, and biological factors. 

This paper is structured as follows: first, we introduce the main ques- 
tions that guided our research, followed by a description of the data and 
methods used, then we summarize the main findings (all the data, code 
and results are openly available), and we end with a discussion and ideas 
for future research. 


THE SEVEN QUESTIONS 
What are they? 


Dental and interdental non-sibilant fricatives (henceforth dental fric- 
atives) broadly symbolized as voiceless [0] (as in English third and 
method) and voiced [6] (as in English the and mother), are a class of 
weak fricatives with common articulation involving the tongue tip or 
blade positioned in proximity to the upper front teeth. The exact place- 
ment of the tongue relative to the teeth can vary in anteriority, with the 
term interdental indicating a more extreme anterior placement of the 
tongue such that it protrudes between the upper and lower incisors 
(which must be slightly separated by jaw opening). This variation in 
exact place of articulation produces very little difference in the auditory 
quality of the fricative, and English is an example where speakers vary 
between dental and interdental productions within and across varieties 
(Ladefoged 1990: 343). Should narrow transcription differentiation be 
required, the extended IPA provides the diacritic (Ball, Howard, and 
Miller 2018) to denote interdental place of articulation. There is a wider 
class of (inter)dental sounds comprised of various other manners of artic- 
ulation, including approximants, which have been reported in some lan- 
guages in the Philippines and Western Australia (Mielke et al. 2011; 
Olson et al. 2010), and interdental stops, reported in some Australian lan- 
guages (Ladefoged and Maddieson 1996). 
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Our focus here is strictly on fricatives, and we note that the sounds we 
consider are distinct from the sibilant dental fricatives [s z]. (Sibilant 
interdental fricatives are a theoretical possibility, but it is difficult to pro- 
duce comparably strong frication with an interdental posture because of 
the absence of a downstream obstruction like the teeth capable of gener- 
ating a strong noise source). Being non-sibilant, dental fricatives gener- 
ate little noise, and lack the distinctive “hiss” that sibilants possess 
(Ladefoged and Johnson 2010). They are also quiet and among the least 
perceptually salient consonants. Their overall low intensity is reflected in 
a spectrogram as a relatively faint, diffuse spectrum, with “no clearly 
dominating peak at any particular frequency region” (Jongman, Way- 
land, and Wong 2000), although some studies estimate that the main 
noise energy lies in the high-frequency band of around 6000 to 8000 Hz 
(Fry 1979; Strevens 1960). In English, as members of the non-sibilants 
(/f, v, 0, 6/), they have slightly shorter durations than the sibilants (/s, z, f, 
3/), although this tendency is dwarfed by the much larger difference in 
duration between the voiced and voiceless fricatives; they also have 
lower overall amplitudes and higher variability (across a range of meas- 
ures, such as the spectral moments) than the sibilants (Maniwa et al. 
2009). Whereas frication noise in sibilant fricatives is sufficient for lis- 
teners to distinguish between place of articulation, interdental (and labio- 
dental) fricatives with more diffuse spectra are difficult to reliably distin- 
guish, so that listeners must rely more heavily on F2 transition (Wright 
2004). The voiced dental fricative /ð/ can typically be differentiated from 
its voiceless counterpart /0/ by the presence of glottal pulses, which man- 
ifest in the spectrogram as a series of vertical striations. In addition, /ð/ 
also tends to be characterized by the existence of a “voice bar”, a dark 
band of low frequency energy near the baseline of the spectrogram 
(Ladefoged and Johnson 2010). Figure 1 shows a spectrogram illustrat- 
ing canonical English [ð] and [6]. 
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Fig. 1. 
Spectrogram of a careful, 
deliberate reading of the 


phrase “The fins are thin” by 


a male native speaker of 
Canadian English (SRM). 


Please note also the spectral 


similarities between the 


voiceless dental fricative [6] 
and the voiceless labiodental 


fricative [f]. 
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What is their current cross-linguistic distribution? 


The dental fricative phonemes /0/ and /ð/ are reported to be rare among 
the world’s languages (Maddieson 1984; Moran and McCloy 2019), but 
can we quantify and describe their distribution in more detail? We 
approach this question using the data from the latest currently available 
version of PHOIBLE (Moran and McCloy 2019) and Glottolog 
(Hammarström et al. 2020). 


Can we know if they were used in past languages? 


Ascertaining the presence of speech sounds in extinct (proto-)languages 
is a formidable task, but even more so when they are rare. So, what can 
we say about the presence of dental fricatives in ancient and recon- 
structed languages, if anything? To answer this question, we use the data 
available in the latest version of the BDPROTO database (Marsico et al. 
2018; Moran, Grossman, and Verkerk 2020). 


How are they borrowed? 


Pretty much anything can be borrowed under the right circumstances 
(Thomason and Kaufman 1988), but the details are very complex, 
indeed. So, can we say anything about the languages that borrowed den- 
tal fricatives and under what circumstances? We use the data available in 
the SegBo database (Grossman et al. 2020) to try to answer this question. 


How are they transmitted vertically? 


Besides being (possibly) borrowed, dental fricatives also evolve verti- 
cally, being inherited, lost or innovated within language families. We 
investigate this using phylogenetic methods (Dunn et al. 2008) applied to 
the PHOIBLE database (Moran and McCloy 2019) in a few large fam- 
ilies that make the application of such methods possible, retrieved from 
D-PLACE (Kirby et al. 2016). 


Does vocal tract anatomy influence their acquisition and production? 


It is generally agreed that dental fricatives are hard to acquire for native 
speakers (Laitman et al. 1972) as well as for second-language learners, 
with several patterns of substitutions being reported and (partly) 
explained by orthography, perception, phonology, and phonetics. Here 
we analyze a large database of speakers from four large ethno-linguistic 
groups that are trained to produce dental fricatives, and where we try to 
predict their success rate at producing the intended /0/ and /6/ using var- 
ious individual measures, including measures capturing the anatomy of 
the vocal tract. In effect, we are trying to see if there are aspects of the 
vocal tract that influence the articulation of the dental fricatives and 
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which may help explain not only inter-individual differences in their pro- 
duction, but also possibly their cross-linguistic patterning. 


So, why are they rare? 


Finally, we hope that the (partial) answers to all of the questions above 
might help us explain not only why these speech sounds are rare, but also 
why they are distributed the way they are both between and within lan- 
guages. 


DATA AND METHODS 


Here we briefly summarize the data and methods used, but the full data, 
code and results are available in the GitHub repositories 
bambooforest/interdentals (https://github.com/bambooforest/interdentals) 
and ScottMoisik/DentalFricGit (https://github.com/ScottMoisik/DentalFricGit), 
for the cross-linguistic and experimental approaches, respectively. 
Moreover, the code needed to generate this paper is available in the 
GitHub repository ddediu/the (https://github.com/ddediu/the). We used 
R (R Core Team 2021) and Rmarkdown (Xie, Allaire, and Grolemund 
2018) through RStudio for data analysis, plotting and the generation of 
the reports and of this paper. 


Cross-linguistic approaches 


We used the following databases: PHOIBLE (Moran and McCloy 2019) 
for the presence/absence of dental fricatives across the world’s languages 
(in particular, for 0 and 6); Glottolog (Hammarström et al. 2020) for 
information about language families; D-PLACE (Kirby et al. 2016) for 
retrieving the phylogenies of large families used in the phylogenetic 
analyses; BDPROTO (Marsico et al. 2018) for the presence/absence of 
dental fricatives in ancient and reconstructed languages (selecting the 
phonemes “0” and “ð”); and SegBo (Grossman et al. 2020) for data on 
borrowing (selecting the phonemes “0” and “6”). While there might be 
various sources of errors and biases in such databases (due to, for exam- 
ple, transcription traditions or the linguist’s familiarity with languages 
that possess dental fricatives), these are notoriously hard to spot and 
would need a study of their own; however, such errors and biases should 
not affect our conclusions here too much. 

Methodologically, all analyses were implemented in R using various 
packages (please see the corresponding analysis report for details). For 
the phylogenetic analyses, we used data for the Indo-European and Sino- 
Tibetan language families, as published in Chang et al. (2015) and Zhang 
et al. (2019), respectively, and available from D-PLACE (Kirby et al. 
2016). For this, we first pruned these phylogenies to the dental data avail- 
able (collapsing, in the process, several varieties belonging to the same 
Glottocodes into a single value), and then we generated stochastic char- 
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acter maps (Huelsenbeck, Nielsen, and Bollback 2003; Nielsen 2002; 
Revell 2012) using the make.simmap() function from package 
phytools (Revell 2012) with the “all rates different” model (ARD), 
where Q is set to “empirical” (maximum probability, full Bayesian 
MCMC) with 10 simulations. 


Experimental approach 


Here we capitalize on a large dataset, part of the ArtiVarK (Articulatory 
variation in speech and language) project conducted at the Max Planck 
Institute for Psycholinguistics, in Nijmegen, The Netherlands, between 
2012 and 2017 (for more details, please see Dediu and Moisik 2019). 
This project is covered by the ethics approval 45659.091.14 (1 June 
2015), Donders Center for Brain, Cognition and Behaviour, Nijmegen; 
for a full description, please see the Supplementary file 1 of Dediu and 
Moisik (2019) hosted at doi:10.5281/zenodo.1480427. While the full 
dataset comprises speech data, anatomical measurements, as well as 
detailed language background information from 96 participants from 
several large ethno-linguistic groups, we keep here only the L2 English 
speakers (removing thus 6 native English speakers). We also removed 7 
other participants that were part of a convenience sub-sample used to 
“fine-tune” the experimental protocol, retaining here 80 participants, all 
L2 English speakers with no dental fricatives in their respective L1 
inventories, from four broadly defined ethno-linguistic groups: “Dutch” 
(speakers of Dutch from the Netherlands), “Chinese” (speakers of Sino- 
Tibetan languages), “North Indian” (speakers of Indo-Aryan languages), 
and “South Indian” (speakers of Dravidian languages). The participants 
were generally young (mean age 25) and had very little formal training 
in phonetics despite being highly educated. 

Linguistic speech data was acquired at the Donders Center for Brain, 
Cognition and Behaviour, Nijmegen, using: (1) an informal interview, 
using prompts such as “What other languages do you speak?” and “Tell 
me a little bit about your experience learning English; when were you 
first exposed, when was it formally introduced in school, and so forth,” 
for the purposes of eliciting naturalistic speech; and (2) a formal reading 
task, where participants were instructed to produce various sounds in an 
isolated /a_a/ context, a sentence context “I say /a_a/ for them”, and a 
tongue twister context, e.g. “This thin thought”. Anatomical data was 
obtained through high resolution intraoral 3D optical scans using a 
TRIOS® 3shape system at the Department of Orthodontics and Cranio- 
facial Biology, UMC Radboud, Nijmegen. In order to quantify these 
anatomical data, the intraoral scans were landmarked and traced using 
Landmark (Wiley 2007) and aligned and analyzed using a custom 
MATLAB script see Figure 2. The landmarks and measures were chosen 
to provide a reasonable coverage of features of the upper and lower jaws 
that might theoretically impact the articulation of dental fricatives 
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(among other sounds that were part of the original ArtiVarK 
study (Dediu, Janssen, and Moisik 2019; Dediu and Moisik 
2019). After eliminating anatomical variables that were too 
strongly inter-correlated (see Dediu and Moisik 2019) and add- 
ing the Principal Components (PCs) of overall jaw morphology 
obtained from the Principal Component Analysis (PCA) of the 
vertex data from the intraoral scans (see Dediu et al. 2019), a 
final total of 34 anatomical variables characterizing the various 
rigid structures of the vocal tract were retained (see Figure 3 


and the Appendix; these anatomical data can be found in the 
supplementary materials of Dediu and Moisik (2019), freely 
available at doi:10.5281/zenodo.1481941). Given the high 
inter-correlations between some of the anatomical variables, 
we also ran a Principal Component Analysis across all these variables, 
resulting in a set of Principal Components that capture the same variation 
but are statistically independent. Participant sex, age, height, weight, for- 
mal phonetic experience, self-declared English proficiency and the broad 
ethno-linguistic group were also included. All independent variables 
were standardized (z-scored). 

From the acoustic recording, all tokens of dental fricatives were tran- 
scribed and coded auditorily, and on the basis of the acoustic signal and 
spectrogram using Praat (Boersma and Weenink 2001). As noted in 
previous studies, it is extremely difficult to annotate dental fricatives 
with absolute certainty: it is challenging to identify dental fricatives reli- 
ably from spectrograms, and even trained phoneticians tend to disagree 
on what classification should be used where dental fricatives are con- 
cerned (Moorthy and Deterding 2000). Therefore, a broad transcription 
was used as far as possible, i.e., diacritics (such as aspiration) were 
avoided unless required, to increase inter-rater reliability, but at the same 
time keeping in mind that most of these dental fricative judgments 
remain subjective (particularly in the case of voiced dental fricatives). 
Two raters coded the data as follows: the first rater (LJ) performed a tran- 
scription and segmentation of the data using Praat and a custom tool 
developed in MATLAB, while the second rater (SRM) provided an audi- 
tory judgment (assisted only by a waveform visualization) of whether, for 
each case of dental fricative transcribed by the first rater, the production 
could be considered successful (i.e., whether it was produced as the 
intended voiceless or voiced non-sibilant dental fricative or something 
else); 0 and 6 were examined separately. For each token, there were 4 
tiers of labeling: segment (what was actually produced by the partici- 
pant), target (what the correct production should have been), position 
(word position of the dental fricative; initial, medial, or final), and word 
(the word that the dental fricative occurred in); the latter two are ignored 
here due to the low number of occurrences of certain values (e.g., dental 
fricatives in word-final position). This allows us to observe the various 
patterns of productions or substitutions that are being used, and to deter- 
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Fig. 2. 

Intra-oral scan of author 
SRM's upper jaw (seen from 
below, scale in millimeters; not 
used in the study), showing the 
landmarks for the hard palate 
anterior (hpa) and posterior 
(App), the left and right canines 
(cal/car), second premolars 
(pml/pmn and second molars 
(eml/emn, and the right incisor 
tip (is), as well as the traces 
along the palate connecting 
these landmarks (used during 
the measurement process). 
Please see the Appendix for 
more information. 
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30- 


| midsagittal 


Fig. 3. 

The full set of primarily classical measurements automatically derived from the landmarks and traces using a custom 
MATLAB script for author SRM (not used in the study). All scales are in millimeters. The following landmarks derived in part 
from those in Figure 2 are employed for measurement: a point in the coronal plane intersected by the second molar land- 
marks (em/ and emr) near the transverse suture (TS); the peak height of the palate roof (PH); the superior (maxillary) midsagit- 
tal alveolar ridge inflection point (UA); the lingual gingival margin of the upper (maxillary) incisors (UM); the incisal edge of 
the (right) upper (maxillary) incisors (UE); the inferior (mandibular) midsagittal alveolar ridge inflection point (LA), the lower 
(mandibular) incisal edges of the (right) incisors (LE); and the centroid of the upper second molar landmarks (em/ and emr in 
Figure 2) (M2, not shown). The (near-)midsagittal contour, offset to the right to include the full right incisor, as seen in the 
coronal (panel A) and axial (panel B) plots of the upper jaw, is divided into four sections (the green highlights): the palate roof 
(TS to PH), the palate transition (PH to UR), the alveolar ridge (URto UM), and the right upper incisor (UM to UE). The lower 
jaw is represented by a single section for the right lower incisor (LR to LE). Along with the half arches of the coronal profile of 
the palate (coronal M2 arch; panel D) and the dental arch (panel B), these sections were analyzed via simple angular meas- 
ures and 4" order polynomial curve fits (with the coefficients listed in descending powers). Measures of tongue tip area and 
available area (ttAvai/A in the Appendix) were computed by integrating the area underneath the alveolar ridge and maxillary 
incisors down to the bottom of the sublingual margin (LA) and subtracting the area beneath the lower incisors (cyan highlight; 
extending to and delimited by the upper incisors) down to the same point (LA). Panel C shows the extraction of panel A. 
Notations (abbreviation; /D in Appendix): palate height (Py; pHe/ght), length (P; pLen) and width (Pyy; pWid), alveolar ridge 
height (Ay; aRHeight) and depth (Ap; aRDepth), overjet (Oy; overjet) and overbite (Op; overbite), and sublingual margin height 
(Sy; s/MHeight) and depth (Sp; s/MDepth). Please see also Figures 10 and 11 and the Appendix. 
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mine the “success rate” for each participant (how often the participant 
accurately produced the intended dental fricative). In the statistical anal- 
yses, we used the second-pass coding. 

Given the exploratory nature of this paper and the relatively small 
sample size, we ran three broad types of statistical analyses, but all 
implemented as mixed-effects (also known as hierarchical) logistic 
regression whereby the dependent variable (the DV) is the binary success 
(“yes” if the second-pass coding of the token matches the intended dental 
fricative, otherwise “no”), the fixed effects (or predictors or independent 
variables, JVs) depend on the model tested, and the participants are mod- 
eled as a random effect. The three broad types are: (1) the independent 
contribution of each possible /V to predicting the DV, (2) the overall con- 
tribution of those /Vs that capture aspects of the anatomy of the vocal 
tract (henceforth anatomical JVs or a/Vs) to predicting the DV above and 
beyond the non-anatomical /Vs, and (3) step-wise statistical model sim- 
plification, where we start with the full model containing all the possible 
IVs and remove sequentially those /Vs that are either too inter-correlated 
with the other /Vs or do not make a sufficiently important contribution 
to predicting the DV, ending with a reduced model with uncorrelated 
IVs that each contribute to predicting the DV. Each of these was imple- 
mented using both frequentist and Bayesian approaches, as 
follows. The frequentist approach used function glmer(..., 
family=binomial (link="logit") ) in library lme4 (Bates, 
Machler, Bolker, and Walker 2015), with fixed-effect p-values as 
given by the asymptotic Wald tests or model comparison through 
anova() with a y? test; we used throughout an a-level 
of 0.05. The Bayesian approach used function brm(..., 
family=bernoulli (link="logit") ) in library brms (Biirkner 
2021) with a student t (3, 0, 3) prior for all fixed effects; we 
used visual checks and R for model convergence, visual checks of the 
posterior distributions, and the practical equivalence test, ROPE-based p- 
values, and posterior null hypothesis testing (as implemented by the 
functions equivalence test() and p_rope() in library 
bayestestR and the function hypothesis () in library brms, 
respectively); we also used model comparison with Bayes factors, LOO, 
WAIC and K-FOLD (n=10). For both approaches to model simplifica- 
tion, we first used VIF (Variable Inflation Factor) with a cut-off of 5 to 
sequentially remove the highly correlated /Vs (i.e., at each step the IV 
with the highest VIF would be removed until all remaining /Vs had a VIF 
<5), but while we used the actual mixed-effect logistic model in the 
frequentist approach to estimate the VIFs, we used instead a “flat” linear 
model with a randomly generated normal (.1(0,1)) “fake” DV (as sug- 
gested by the creator of the brms package, Paul-Christian Biirkner; this 
is based on the fact that the VIF of a predictor depends only on the other 
predictors and not on the dependent variable, in essence being a program- 
ming “trick” to use R’s check_collinearity () function from 
package performance). This was followed by the “importance”-based 
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sequential removal of /Vs, in the frequentist approach using the AIC 
(Akaike Information Criterion)-based method implemented by R’s 
step () function, followed by a p-value-based method; in the Bayesian 
approach, we used the practical equivalence test, ROPE-based p-values, 
and posterior null hypothesis testing, ranked in this order. We also per- 
formed a post-hoc statistical power analysis to study the effect of sample 
size on the probability of detecting the kind of anatomical effects on the 
success rate of 0 and 6 observed in our data. 

It is important to stress that we interpreted the results of these 
approaches jointly, trying to extract the strongest signals in our data, and 
that we see these results as purely exploratory and we intend them as gen- 
erating hypotheses for future, targeted studies. 


RESULTS 


The results are structured by question (except for the first question, 
which was already answered above), but the interested reader can find 
full details in the two GitHub repositories mentioned. 


What is their current cross-linguistic distribution? 


There are currently 2177 distinct Glottolog codes in PHOIBLE, of which 
220 (or 10.1%, representing 237 inventories) have dental fricatives. Also 
considering various diacritics and combinations, the dental fricatives 
appearing in more than two inventories are: 6 (160; 67.5%), 8 (123; 
51.9%), 6 (7; 3%), ð (6; 2.5%), tO (5; 2.1%), t0’ (5; 2.1%), ð: (4; 1.7%), 
ði (4; 1.7%), t0" (4; 1.7%), dd (3; 1.3%), ð (2; 0.8%), and @,(2; 0.8%). It 
is clear that the overwhelming majority is represented by 0 and ð. 

They are attested in all macroareas: Africa (76), Australia (28), Eur- 
asia (67), North America (21), South America (15) and Papunesia (12), 
but most inventories have only one dental fricative (152 representing 
64.1%), while 72 (30.4%) have 2, 9 (3.8%) have 3, 1 (0.4%) has 4, and 3 
(1.3%) have 5. There are slightly more inventories with a voiceless than 
with a voiced dental fricative (Fig. 4). 


Can we know if they were used in past languages? 


There are 214 unique Glottolog codes in BDPROTO, of which 24 have 
dental fricatives (11.2%). The most frequent dental fricatives are [6] (21 
occurrences) and [0] (15), followed by [9] (6), [©] (5), [8:] (4), [6:] (2), 
[0°] (2), [0°] (2) and several more with a single occurrence, but these esti- 
mates should be taken as suggestive at best. Recent work also shows that 
the relative frequency of dental fricatives in languages today (as given by 
PHOIBLE) is lower than what it was in the past (as given by BDPROTO), 
whereas, for example, the labiodental fricatives show the opposite trend, 
becoming much more common now than in the past (Moran, Lester, and 
Grossman 2021). 
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How are they borrowed? 


The SegBo database contains 598 datapoints, but only a handful of cases 
where dental fricatives were borrowed (see Table 1). It can be seen that 
most cases involve single segments and an unknown origin, but also that 
Swahili [swah1253] borrowed three dental fricatives from Standard 
Arabic via loanwords, as did Nubi [nubil253] for two dental fricatives 
from Arabic or English. However, the rarity of these borrowings should 
be seen in the context of the overall rarity of dental fricatives and of the 
nature of SegBo, which is a convenience sample, not genealogically nor 
areally balanced. 


How are they transmitted vertically? 


Here we use phylogenetic methods to investigate the diachronic evolu- 
tion of dental fricatives, more precisely to estimate the way languages 
gain or lose such sounds through time, as well as the probabilities that 
ancient proto-languages might have had such sounds. However, such 
methods are fruitfully applicable only for large language families where 
enough languages (and, if possible, earlier proto-languages) have non- 
missing data for the traits of interest (here, the presence or absence of 
dental fricatives in their inventory). In practice, this means that we only 
investigated the Indo-European and the Sino-Tibetan languages (N.B. 
other families might be amenable to such analyses, but we leave this for 
future research). For Indo-European, we used the phylogeny published 
by Chang et al. (2015), and for Sino-Tibetan that published by Zhang et 
al. (2019), both available in D-PLACE (Kirby et al. 2016). Matching the 
available data for the presence/absence of dental fricatives in the lan- 
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Source 

aika1237 (Aikana) 

stan1288 (Spanish) 

stan1288 (Spanish) 

stan1288 (Spanish) 

stan1318 (Standard Arabic) 

? 

stan1318 (Standard Arabic) 

stan1318 (Standard Arabic) 

stan1318 (Standard Arabic), stan1293 (English) 
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? 
? 


? 


Table 1. 


Destination Segments 
kwaz1243 (Kwaza) Ö 
kumi1248 (Tipai) fò] 
mapu1245 (Mapudungun) fò] 
sout2991 (South Bolivian Quechua) fe) 
chal1275 (Chaldean Neo-Aramaic) oy 
chal1275 (Chaldean Neo-Aramaic) oy 
nucl1706 (Neo-Mandaic) ös 
swah1253(Swahili) 0, OY, 8 
nubi1253 (Nubi) 6,8 
chig1250 (Chiquihuitlan Mazatec) fe) 
lish1247 (Lishana Deni) roll 
luch1239 (Luchazi) 0, 8 
mlah1239 (Mlahsé) O° 
para1311 (Paraguayan Guarani) 

teop1238 (Teop) 

turo1239 (Turoyo) O° 


Cases of dental fricative borrowing contained in the SegBo database. The source (when known) and destination languages 
are represented by their Glottocode, and are: aika1237 (Aikana, an isolate spoken in Brazil), stan7288 (Spanish), stan1318 
(Standard Arabic) and stan1293 (English) as sources, and kwaz1243 (Kwaza, an isolate), kumi1248 (Tipai, a Cochimi-Yuman 
language), mapu1245 (Mapudungun, an Araucanian language), sout2991 (South Bolivian Quechua), cha/1275 (Chaldean Neo- 
Aramaic, an Afro-Asiatic language), nucl1706 (Neo-Mandaic, an Afro-Asiatic language), swah1253 (Swahili, an Atlantic- 
Congo language), nubi7253 (Nubi, an Afro-Asiatic language), chiq1250 (Chiquihuitlan Mazatec, an Otomanguean language), 
lish1247 (Lishana Deni, an Afro-Asiatic language), /uch1239 (Luchazi, an Atlantic-Congo language), m/ah1239 (Mlahsé6, an 
Afro-Asiatic language), para1311 (Paraguayan Guarani, a Tupian language), teop1238 (Teop, an Austronesian language), and 
turo1239 (Turoyo, an Afro-Asiatic language). 


154 


guages of these families results in the pruned phylogenetic trees shown 
in Figures 3 and 4, where, for each language with data, we show if it has 
(blue) or does not have (red) dental fricatives. While for Indo-European 
there are quite a few languages with dental fricatives, for Sino-Tibetan 
there are only two such languages: S’gaw Karen [sgaw1245] and 
Burmese [nucl1310]; Burmese reportedly has dental fricatives /@/ and /6/, 
and S’gaw Karen (also spoken in Myanmar and Thailand) reportedly has 
a voiceless dental fricative /0/, but more research is needed about their 
phonetics and history. For both families, there is an overwhelming prob- 
ability that dental fricatives were not present in their proto-languages 
(~80% for Indo-European and ~95% for Sino-Tibetan), but emerged rel- 
atively recently and rather “patchy” (especially in Sino-Tibetan); see 
Figures 5 and 6. 
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Indo-European phylogeny (pruned to the languages with data) showing which languages (terminal nodes) have (blue) and 


Slovenian 
Bulgarian 
Macedonian 
Serbian Standard 
Russian 
Ukrainian 

Polish 

Slovak 

Czech 

Upper Sorbian 
Lithuanian 
Friulian 
Romansh 

Italian 

Catalan 

Spanish 
Portuguese 
Occitan 

French 
Romanian 
Campidanese Sardinian 
Welsh 

Breton 

Scottish Gaelic 
Irish 

Danish 

Swedish 
Norwegian Bokmal 
Faroese 
Icelandic 

English 

German 

Central Alemannic 
Moselle Franconian 
Afrikaans 

Dutch 

Vlax Romani 
Sinhala 

Bengali 
Assamese 
Gujarati 

Marathi 

Maithili 

Urdu 

Eastern Panjabi 
Hindi 

Nepali 

Kashmiri 


Ossetic 
Western Farsi 

Tajik 

Central Pashto 
Northern Pashto 
Southern Balochi 
Northern Tosk Albanian 
Eastern Armenian 
Modem Greek 


which do not have (red) dental fricatives in their inventories, as well as the reconstructed probability of dental fricatives in 


the history of this family (branches), from 0% (red) to 100% (blue). 


Does vocal tract anatomy influence their acquisition and production? 
Exploratory analyses 


Figures 7 and 8 show the various patterns of /0/ and /ð/ productions per 
participant, respectively, while Tables 2 and 3 summarize it by sex and 
group. 

For /0/, it can be seen that the majority of participants are able to pro- 
duce [0] to some extent, although there are a fair number of participants 
(mostly from the North and South Indian groups) that never produced it 
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Mandarin Chinese 
Pa'o Karen 

S'gaw Karen 
Tedim Chin 
Thado Chin 

Lotha Naga 

Ao Naga 

EC Tangkhul Naga 
Sumi Naga 
Angami Naga 
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Bodo-Mech 

Garo 

Nyishi-Hill Miri 
Apatani 

Thulung 

Wayu 

Chepang 

Pattani 

Kinnauri 

Manange 

Thakali 

Gurung 

Khams Tibetan 
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Amdo Tibetan 
Shixing 
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Lizu 

Ersu 

Northern Qiang 
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Northern Gyalrong 
Axi Yi 

Lisu 

Lahu 

Burmese 

Zaiwa 


Longchuan Achang 


1 


length=2000 


Fig. 6. 

Sino-Tibetan phylogeny (same 
conventions as for Fig. 5 
above). 


156 


at all, and that many participants do not stick to using only one type of 
substitution. The accuracy rates differ vastly between groups: Dutch is 
most accurate overall (70.1%), followed by Chinese (64%), while the 
accuracy of the North and South Indian groups is much lower (at 15.2% 
and 16.4%, respectively). [t] is the most common substitution, being used 
to some degree across all groups, although there is a tendency for [s] sub- 
stitution in the Chinese group over [t]. Females are markedly more accu- 
rate than males across all groups. 

For /6/, more than half of the participants are able to produce it as [ð] 
to some extent, although accuracy rates are lower than for /0/ across the 
board. The Chinese and Dutch groups are accurate nearly only half the 
time (52.6% and 51.4%, respectively), and the North Indian and South 
Indian groups remain less accurate (at 24.1% and 13.2%, respectively). 
Females are generally more accurate than males. There is less variation 
in the substitutions used here, with near-unanimous usage of [d] across 
all groups, and minimal usage of [z], which is surprising, as it was 
reported to be one of the common substitutions for the Chinese group 
(Deterding 2006). 
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Bar plots depicting the proportions (%) of voiceless dental fricative /6/ productions per participant, regardless of word posi- 


tion, showing the participant's sex (initial letter) and group. 
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Table 2. 
Percentage and number of occurrences of voiceless dental fricative /8/ productions by group and sex, regardless of word 
position. “Other” signifies that the segment cannot be properly categorized. The most common substitutions are in bold. 


Group Sex (:] t s f th other 
Chinese F 89.9% (71) 0.0% (0) 10.1% (8) 0.0% (0) 0.0% (0) 0.0% (0) 
Chinese M 52.0% (89) 15.8% (27) 26.3% (45) 0.6% (1) 2.3% (4) 2.9% (5) 
Dutch F 77.8% (336) 10.6% (46) 2.3% (10) 3.7% (16) 5.1% (22) 0.5% (2) 
Dutch M 61.5% (240) | 26.4% (103) 1.3% (5) 1.8% (7) 3.3% (13) 5.6% (22) 
North Indian F 25.3% (23) 61.5% (56) 0.0% (0) 0.0% (0) 13.2% (12) 0.0% (0) 
North Indian M 11.7% (30) 79.4% (204) 0.0% (0) 0.0% (0) 7.8% (20) 1.2% (3) 
South Indian F 30.6% (19) 67.7% (42) 1.6% (1) 0.0% (0) 0.0% (0) 0.0% (0) 
South Indian M 13.9% (51) 83.1% (304) | 0.3% (1) 0.0% (0) 2.2% (8) 0.5% (2) 
Table 3. 


Percentage and number of occurrences of voiced dental fricative /6/ productions by group and sex, regardless of word posi- 
tion. “Other” signifies that the segment cannot be properly categorized. The most common substitutions are in bold. 


Group Sex fi) d t Z (:] other 

Chinese F 48.6% (54) 49.5% (55) 0.9% (1) 0.9% (1) 0.0% (0) 0.0% (0) 
Chinese M 54.7% (110) | 44.8% (90) 0.0% (0) 0.5% (1) 0.0% (0) 0.0% (0) 
Dutch F 64.9% (318) 33.9% (166)  |0.2% (1) 0.2% (1) 0.6% (3) 0.2% (1) 
Dutch M 37.6% (180) 61.6% (295) (0.0% (0) 0.2% (1) 0.0% (0) 0.6% (3) 
North Indian F 37.5% (48) 48.4% (62) 14.1% (18) 0.0% (0) 0.0% (0) 0.0% (0) 
North Indian M 19.0% (64) 80.7% (272) | 0.0% (0) 0.3% (1) 0.0% (0) 0.0% (0) 
South Indian F 40.8% (31) 59.2% (45) 0.0% (0) 0.0% (0) 0.0% (0) 0.0% (0) 
South Indian M 9.0% (45) 90.6% (454) =| 0.2% (1) 0.0% (0) 0.2% (1) 0.0% (0) 


Principal Component Analysis 


Given that some of the anatomical measures are highly inter-correlated, 
we also performed a Principal Component Analysis (PCA) on them. The 
first 13 PCs explain 90% of the variation, but there is a relatively steep 
drop in the explained variance after the first 4 components: PC1 explains 
16.9% of the variance, PC2 15.3%, PC3 13.8%, PC4 11.1%, and PC5 
only 6.6%. 


Inter-rater agreement 
We estimated the agreement between the two raters using: (1) the percent 


agreement (91.6% for [0] and 81.3% for [6]), (2) Cohen’s x (0.83 for [0] 
and 0.64 for [6], both significantly different from 0), (3) Krippendorff’s 
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a (0.83 for [0] and 0.63 for [6]), and (4) ICC (the intraclass correlation 
coefficient with only the productions as random effects but not the raters; 
0.83 with 95% confidence interval [0.82,0.84] for [0] and 0.63 with 95% 
CI [0.60,0.65] for [6], both significantly different from 0). It is clear that 
[8] has a much higher agreement than [ð], but both show substantial 
agreement (e.g., McHugh (2012) suggests 80% as the minimum for per- 
cent agreement, and that Cohen’s « values higher than 0.6 as showing 
moderate agreement and values above 0.8 as having strong agreement). 
However, given that we used the checked scores in the actual analyses 
(i.e., those of the second rater), these inter-rater agreements do not have 
the usual interpretation, but should be taken rather as the degree of con- 
sistency between the first pass coding and the final coding. 


Fitting the IVs one by one 


Fitting the [Vs one by one suggests that while there is quite solid 
evidence of anatomical influences on 0 (especially in what concerns 
lowIA, jTotPC3, PC9 and PC10), for 6 the evidence is very circumstan- 
tial; combining 8 and 6 suggests some anatomical effects (especially of 
jLowPC3). Looking at the non-anatomical factors, sex (females are sig- 
nificantly better), phonexp (has a positive effect) and group clearly affect 
performance. For the latter, there seem to be no differences between 
North vs South Indians, and between Dutch vs Chinese, but there are sig- 
nificant differences between, on the one hand, Dutch and Chinese (better 
performance) vs North (except not for 6) and South Indians. Also, there 
is a clear difference between the two sounds (0 has significantly better 
performance than ð). 


Model comparisons 


Comparing the models with and without alVs, which test the overall con- 
tribution of those /Vs capturing aspects of vocal tract anatomy, seems to 
suggest that vocal tract anatomy contributes to predicting success, espe- 
cially clearly for [0] and combined, but less convincing for [6], more so 
when using the frequentist approach than when using the Bayesian 
approach. 


Model simplification 


Please note that all the details about the model simplifications, including 
the order of removing predictors and the reasons therefor, are given in the 
ScottMoisik/DentalFricGit GitHub repository, particularly in the HTML 
analysis report. 

In the frequentist approach, for [8], several alVs predict success, in 
particular m2Height, C.P2w and jLowPC3 (having a negative effect, 
denoted henceforth as “—”), and Jow/A, slMDepth and aRA (positive 
effect or “+’); likewise, using the Principal Components, PC7 (—) and 
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PC3 (+) help predict success; using both types of /Vs, the conditional R? 
> 85%. In contrast, for [ð] the evidence is much weaker (but still concor- 
dant with that for [0]): m2Height, C.P2w, j]LowPC3 and pRoofA (—), with 
a conditional R? > 82%. Combining the two sounds finds m2Height (—) 
and s/MDepth (+), and PC7 (—) and PC3 (+), with a conditional R? > 
75%. Analyzing the four groups separately drastically reduces the power, 
and we applied no multiple testing correction across the groups, so these 
results should be taken as suggestive at best. For the Dutch, for both 
sounds phonexp seems to have a positive effect, for the North Indians, 
possibly engprof has a positive effect for [6], for the South Indians and 
the Chinese, there are weak hints that vocal tract anatomy might affect 
both sounds. 

In the Bayesian approach, for [0], several alVs predict success, in 
particular m2Height, C.P2w and jLowPC3 (-), and JowIA and s/MDepth 
(+); PC7, PC8 and PC10 (—) and PCS (+). For [ð] there is evidence for a 
negative effect of m2Height; and positive for PC//. Combining the two 
sounds finds effects for C.P2w (—) and s/MDepth (+), and PC8 (—), and 
PCS and PC11 (+), respectively. Analyzing the four groups separately: 
for the Dutch, for both sounds phonexp (+), and there are some hints of 
effects of vocal tract anatomy; for [8], PC7 (—), for [ð], /LowPC2 (—), and 
for the combined sounds, s/MDepth (+), jLowPC2 (—) and PC12 (-). For 
the North Indians, for [6theta;], engprof; weight, PC11 and PC13 (+) and 
pWid, m2Height, overjet, PC6, PC9 and PC10 (—); for [6], engprof, 
weight, PC2 and sex (female) (+), and m2Height (—); when combining 
the sounds, sex (female), phonexp and PC6 (+) and m2Height, PC7, PC8 
and PC/3 (—). For the South Indians, there are some hints that vocal tract 
anatomy affects both sounds. For the Chinese, for [0], pHLRat and ant- 
PAreaR (—), and aRA, pRoofA, sex (female), engprof, PCS and PCS (+); 
for [ð], cWidth, PC8 and PC/2 (+), and lowIA, aRHeight, antPArea, ant- 
PAreaR and PC13 (-). 


Power analysis 


For both [0] and [6] we focused on the effects of C.P2w, which, in the 
reduced models, has an effect J of about —1.0 in both cases. The observed 
(post-hoc) power (for an a-level of 0.05) is 75.6% with a 95% CI (72.8%, 
78.2%) for [0], and 41.9% (38.8%, 45.0%) for [ð]. Changing the number 
of participants (respecting the distribution by sex and group) suggests 
that we would need about 85 participants to achieve a power 1-8 = 80% 
and about 120 participants to achieve 90% for [0], but about 220 partici- 
pants and about 350 participants, respectively, for [ð]. 


DISCUSSION AND CONCLUSIONS 
Our cross-linguistic analyses seem to confirm that the dental fricatives 


are relatively rare among the present-day languages, being attested in 
only about 10% of these, spread across all macroareas. Moreover, the 
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majority of the languages that do contain dental fricatives in their phono- 
logical inventory have only one dental fricative. Globally, there are 
slightly more languages with voiceless than with voiced dental fricatives. 
This present-day situation seems to not be very different from what can 
be inferred about the past, with only about 11% of ancient and recon- 
structed languages having such sounds in their inventory. Apparently, 
dental fricatives are very rarely borrowed between languages, but this has 
to be taken with a grain of salt given their overall rarity. When segments 
are typologically very common, e.g., [m, k, i], or very rare, e.g., they 
appear in only one or a few languages, they are less likely to be borrowed 
because most languages already have the segment or because the seg- 
ment is rarely in a contact situation to induce borrowing (Eisen 2019). 
And if the results for two large language families that we could analyze 
phylogenetically (Indo-European and Sino-Tibetan) are to be taken at 
face value, then dental fricatives seem to emerge relatively seldom and to 
disappear quickly. These findings seem to support Nichols (2017)’s sug- 
gestion that these sounds are rare because they are ultimately prone to 
replacement. However, it is interesting to note that dental fricatives do 
occur, and have been retained in (at least in some of the dialects of) some 
of the languages with very large numbers of speakers (and which enjoy 
high prestige), including English, Castillian Spanish, and Standard 
Arabic. 

Switching now to their acquisition and patterns of substitution, 
studies in both first (L1) and second (L2) language highlight that the 
acquisition of dental fricatives is difficult. During L1 acquisition, dental 
fricatives are often substituted for other speech sounds as they are diffi- 
cult for children to articulate (Laitman et al. 2014). Difficulties in their 
production and perception are well documented in L2 acquisition as well. 
In particular, they are often substituted for other speech sounds: L2 
speakers of English whose L1 inventory lacks these sounds, generally 
face difficulties enunciating them properly in English. For these 
speakers, dental fricatives are often realized as, or substituted with other 
articulatorily or perceptually similar phonemes. The alveolar (or some- 
times dental) stops [t, d] are by far the most common, followed by the 
alveolar fricatives [s, z], and, occasionally, by the voiceless labiodental 
fricative [f] (see Table 4). Note that this distribution of substitution pat- 
terns roughly parallels the changes to dental fricatives that we see in vari- 
eties of English, with stops being the most common replacement (Blevins 
2006: 11) alongside the occasional appearance of labiodentals (as it 
occurs finally in Singapore English, such as with being commonly real- 
ized as [wif]; see Moorthy and Deterding 2000). However, despite exten- 
sive research and a wide range of proposed explanations, the differential 
substitution of dental fricatives is still not fully understood. 

We briefly review below several existing theories concerning the dif- 
ferential substitution of dental fricatives. 

Orthographic influence. It has been suggested that orthography may 
play a role (Brannen 2002; Paradis and LaCharité 2012). Although L1 
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Table 4. 

Commonly reported primary 
substitutions of dental fric- 
atives for speakers of various 
L1s in word-initial position. 
Presence of the diacritic [“"] 
indicates that aspiration is 
used, and [*] indicates that the 
place of articulation is dental. 


L1 
Dutch 


Mandarin Chinese 


Cantonese 


Indo-Aryan (Hindi, Urdu, etc.) 


Dravidian (Tamil, Kannada, etc.) 


Standard German 


Swiss German 


Quebec French 


European French 


Japanese 


Russian 


162 


Dediu, Lin, Moisik, Moran 


English speakers may produce a dental fricative almost intuitively when 
they see the (th) digraph, this may not necessarily be the case for 
speakers of other L1s. For example, (th) normally represents the alveolar 
stop /t/ in French (e.g., bibliothèque “library” /bi.bli.jo.'tek/), and simi- 
larly for Dutch (e.g., thema “theme” /'te.ma/). Accordingly, L1 speakers 
of these languages may have a tendency to realize the (th) digraph in 
English as [t] as well (Brannen 2002; Wester et al. 2007). One objection 
to this proposal is that the voiced English dental fricative is also repre- 
sented by the same digraph (th), and yet /ð/ is hardly ever substituted 
with [t] (see Table 4), and this proposal is even less applicable to speakers 
whose L1 does not use the (th) digraph, such as Japanese and Russian. 
Perception-based second language (L2) theories. General models 
of L2 speech acquisition, such as the Perceptual Assimilation Model 
(PAM; (Best 1994)), and the Speech Learning Model (SLM; Flege 1995), 


Primary substitution(s) References 


Thompson (2001) 
Brannen (2002) 
Teasdale (1997) 

Monk and Burak (2001) 


6] > [t] Wester, Gilbers, and Lowie (2007) 
6] — [d] Hanulikova and Weber (2010) 
|[e] > [s] ‘Deterding (2006) 
6] > [d] / [z] Rau, Chang, and Tarone (2009) 
6] > [f] | Bolton and Kwok (1990) 
6] > [d] Deterding, Wong, and Kirkpatrick (2008) 
0] > [t"] / [t] Shackle (2001) 
õ] > [d] Syed (2013) 
0] > [t] / [t] Bhatt (1995) 
6] — [d] Narasimhan (2001) 
0] > [s] Hanulikova & Weber (2010) 
ð] > [z] Swan (2001) 
6] > [f] | Graeppi and Leemann (2019) 
ð] > [d] Graeppi and Leemann (2019) 
6] > [t] Gatbonton (1978) 
| 6] > [d] morrison_dat_2005 
6] > [s] Picard (2002) 
ð] > [z] Brannen (2002) 
0 
ð 
0 
ð 
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posit that “L2 phonetic segments can only be produced as accurately as 
they are perceived” (Flege 2003), emphasizing the correlation between 
perception and production. The SLM, for example, classifies L2 sounds 
into 3 categories: “new,” “similar,” and “identical,” with “similar” 
sounds being the most problematic: if learners fail to discriminate 
between the L2 sound from the “similar” L1 sound, they will merge the 
two sounds into the same phonemic category (termed “equivalence clas- 
sification” (Flege 1986)), and produce them as such. Although these 
theories have some empirical basis (Aoyama et al. 2004; Best, McRob- 
erts, and Goodell 2001; Flege 1986), most notably in the case of the /r/-/I/ 
merger in Japanese, other studies have identified an asymmetry between 
perception and production in the case of dental fricatives. For instance, 
high rates of accuracy in discrimination tasks did not necessarily equate 
to matching rates of accurate production (Reis 2006; Syed 2013): even 
when perceptual errors were made, these did not correspond with produc- 
tion errors. Broadly speaking, the labiodental fricative /f/ has been found 
to cause the greatest perceptual confusion with the voiceless dental fric- 
ative /0/ due to their acoustic similarities (Brannen 2002; Cutler et al. 
2004; Reis 2006), and thus, these two sounds should have been catego- 
rized as the same phoneme according to the SLM, but, as discussed by 
Hanulikova and Weber (2010), [f] is not the most common substitution 
among L2 English speakers, not even when /f/ exists in their L1 phoneme 
inventory. 

Phonological approaches. There are several phonological accounts 
within various theoretical frameworks, including the Feature Competi- 
tion Model (Hancin-Bhatt, 1994), the Underspecification Theory (Wein- 
berger 1997), and the Auditory Distance Model (Brannen 2011), to name 
a few. These theories are largely generative in nature, and mainly revolve 
around the saliency or markedness of abstract phonological features. For 
example, under the Feature Competition Model, Hancin-Bhatt (1994) 
identified that the feature [+continuant] marks a significant number of 
contrasts in German and, thus, the relative prominence of this feature is 
calculated to be high. Consequently, speakers of German will pay par- 
ticular attention to the feature [+continuant] in their perception of the 
English dental fricative, causing them to map it with their fricative [s] 
({+continuant]), instead of [t] ({-continuant]). The problem with most of 
these theories is that they are multi-layered and fairly convoluted, involv- 
ing complicated algorithms to determine feature prominence, functional 
load, auditory distance, etc. Despite this complexity, these algorithms are 
not widely generalizable, i.e., they work for the languages under scrutiny, 
but not others, which are often brushed off as shortcomings. Additionally, 
Picard (2002) states that one difficulty with such theories is that they 
often constitute an “all-or-nothing proposition,” which is problematic, as 
there is evidence that while certain substitutions are predominant, these 
substitutions are almost never exclusive (Hanulikova and Weber 2010; 
Rau et al. 2009; Wester et al. 2007). They also fail to account for 
instances where the substitution of English dental fricatives is contex- 
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tually variable, i.e., different substitutions are used depending on syllable 
position (Wester et al. 2007). 

Phonetic approaches. Research on the differential substitution of 
dental fricatives was, and remains, dominated by phonological theories, 
on the ostensible premise that phonetics cannot successfully predict the 
substitutions, even though it has been acknowledged that the problem 
itself is phonetic in nature (Altenberg and Vago 1983). While research 
undertaking a more phonetic-based approach is still scarce, there are a 
handful of studies that have diverged from the phonological viewpoint. 
One phonetic-based insight on the issue was put forth by Teasdale 
(1997), where it was hypothesized that a speaker will “choose” to substi- 
tute his dental fricatives with fricatives or stops depending on his tongue 
position in the production of the phoneme /s/. It was predicted that a 
speaker of a language using a dental [s] will tend to substitute their dental 
fricatives with [s] in English, whereas a speaker of a language using an 
alveolar [s] will substitute it with [t]. Using noise cut-off as a basis for 
inferring place of articulation, it was found that European French (EF), 
an [s] substituting language, did indeed have a more dental [s] with a cut- 
off at 5.6 kHz. On the other hand, Quebec French (QF), a [t] substituting 
language, possessed a more alveolar [s], with a considerably lower cut- 
off point at 4.3 kHz. The main strength of this approach is that it seems to 
be able to account for the classic dilemma that phonological approaches 
could not: the differential substitution of EF versus QF, two similar lan- 
guage varieties with near-identical underlying language inventories, and 
yet substituting dental fricatives differently. Although Teasdale’s (1997) 
approach has been hailed as being “much more promising” than earlier 
phonological approaches (Picard 2002), it is not without limitations. For 
example, within her own study, the hypothesis was not upheld for Japa- 
nese and Russian, and noise cut-off was not always a reliable indicator of 
place of articulation. Additionally, the hypothesis did not yet include lan- 
guages that substitute dental fricatives with [f], such as Cantonese (Bol- 
ton and Kwok 1990). 

Clearly, there appears to be no simple, straightforward answer to the 
question of dental fricative differential substitution in L2 speakers. Other 
potential confounding factors also exist, including phonetic (neighboring 
sounds) and stylistic (level of formality) constraints (Dickerson and 
Dickerson 1977), as well as age of L2 acquisition (Cornwell and Rafat 
2017) and L2 proficiency (Reis 2006), that are not discussed here due to 
their scarcity in the literature. Therefore, the substitution of dental fric- 
ative is probably due to a combination of multiple factors, some of which 
might still not be studied well enough to quantify their influence: this is 
why we have investigated here the potential influence of details of vocal 
tract anatomy on inter-individual variation. This latter point is not as 
unexpected as it might look at first sight (Dediu, Janssen, and Moisik 
2017): the articulators are usually idealized and thought of being essen- 
tially uniform across our species, even though, like anything else biolog- 
ical, there is extensive, normal (non-clinical) variation in nearly any 
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aspect of the vocal tract, on an inter-individual level, and possibly even 
on an inter-population level. Variation in the morphology of the vocal 
tract can have a nuanced but appreciable impact on speech production, as 
several empirical studies have shown. Brunner, Fuchs, and Perrier 
(2009), in their study on the effect of different palate shapes on acoustic 
and articulatory variability, observed that speakers with flat palates had 
to greatly limit their articulatory variability to counteract the natural arti- 
culo-acoustic influence of palate flattening, so as to sustain the same 
acoustic variability as speakers with dome-shaped palates. These results 
demonstrate that, in order to preserve the acoustic correlates of the per- 
ception of certain sounds, speakers “specifically adapt their articulatory 
variability to their morphology” (Brunner et al. 2009). In another study, 
Moisik and Dediu (2017) set out to investigate whether the lack of a 
prominent alveolar ridge, which occurs more frequently among speakers 
of Khoisan-type languages, aids in the production of clicks. Their results 
indicate that having a smooth palate may provide an articulatory “bias” in 
the form of decreased articulatory effort and improved volume change 
characteristics, as compared to having a larger alveolar ridge. More 
recently, Dediu and Moisik (2019) also showed that the use of either “ret- 
roflexed” or “bunched” strategies by ESL speakers in producing the 
North American English /r/ reflects differences in bracing based on vari- 
ation in the underlying anterior vocal tract anatomy, with narrower pal- 
ates favoring a “bunched” configuration; although their results are 
largely preliminary, they do suggest an effect of anatomy. Blasi et al. 
(2019), building on an earlier suggestion by Hockett (1985), show that 
the type of “edge-to-edge” bite more frequent among populations prac- 
ticing hunting and gathering (as opposed to overjet and overbite, more 
frequent among populations practicing agriculture) generate a negative 
bias against labiodental sounds (such as /f/ and /v/). While this anatomi- 
cal difference is acquired during the lifespan and ultimately due to over- 
all differences in food consistency between the two types of diets, it does 
suggest that details of vocal tract anatomy affect large scale cross-lin- 
guistic variation. Taken together, these studies illustrate that anatomical 
factors can influence speech production, whether in terms of ease of pro- 
duction, articulatory strategies required, or otherwise. Therefore, it is not 
entirely inconceivable that there could be some physical influence on the 
production and substitution of dental fricatives as well, considering that 
extreme precision is required in the production of fricatives, so much so 
that even “a variation of one millimeter in the position of the target [...] 
makes a great deal of difference” (Ladefoged and Maddieson 1996). 
Our data presented here seem to support such a view. Firstly, we 
noticed that most participants could produce dental fricatives, albeit with 
varying degrees of accuracy. This has two main implications: (1) their 
ability to produce dental fricatives underscores that it is very unlikely that 
the problem is purely perceptual in nature, which undermines several 
perceptual theories on the matter (Best 1994; Flege 1995); and (2) dental 
fricative production is often not “all-or-nothing”: less than half of our 
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participants produce them 100% or 0% of the time, while everyone else 
falls in between. This is generally not taken into account by many exist- 
ing theories and is probably true in many other cases than the dental fric- 
atives as well. It is therefore more realistic to undertake a probabilistic 
approach, like in the present study. 

Secondly, we observed a fair amount of intra-speaker variation with 
regards to the substitutions used, that is, most speakers do not use only 
one type of substitution (at least for /6/), consistent with earlier descrip- 
tive studies (Hanulikova and Weber 2010; Wester et al. 2007). We note 
that this is especially true for speakers that primarily use [s] substitutions: 
it is usually accompanied by [t] substitutions. This makes it tricky for 
theories that try to predict the substitutions used, as the typical working 
assumption for these theories is that L2 English speakers use only one 
type of substitution exclusively. 

Third, we also observed an overwhelming tendency for alveolar sub- 
stitutions across all speakers, i.e., /0/ being realized as [t] or [s], /ð/ being 
realized as [d] or [z]. As mentioned previously, this is peculiar, as /t/ and 
/s/, unlike /f/, are not known to cause much perceptual confusion with /0/. 
A possible reason for this tendency will be discussed below. 

Fourth, there is a clear difference between the two sounds, with /0/ 
having a higher success rate than /ð/ and being more convincingly 
influenced by vocal tract anatomy. 

Fifth, the control variables (i.e., that do not capture the vocal tract 
anatomy) play an important role in explaining articulatory success. When 
considering each predictor individually, females (sex) are significantly 
better, and more experience with phonetics (phonexp) helps. When it 
comes to the four groups, for /0/ there seems to be a partition between 
(Dutch + Chinese) vs (North + South Indians), while for /6/ the partition 
is less clear. However, model simplification suggests that while phonetic 
experience maintains its positive effect above and beyond other covari- 
ates for both sounds and in all analyses, the effect of sex (females are 
better) is reliably found only in the Bayesian analyses (interestingly, Eng- 
lish proficiency, engprof, seems to have a positive effect in the frequen- 
tist models). 

Finally, the variables that capture the vocal tract anatomy seem to 
have an effect on /0/ but much less convincingly on /6/. For /0/, the 
frequentist and Bayesian analyses tend to agree in finding effects of 
m2Height (—), C.P2w (—), jLowPC3 (—), lowIA (+) and slMDepth (+), 
and, when using the Principal Components, of PC7 (—). For /6/, both 
analyses suggest an effect of m2Height (—). Combining the two sounds 
finds across both analyses an effect of s/MDepth (+) and possibly of 
m2Height (—) or C.P2w (—). Thus, overall, it seems that m2Height has a 
negative influence on success for both sounds, while s/MDepth seems to 
have a positive effect on /6/. 

Although these specific anatomical variables should not be taken too 
literally in light of the limitations of model simplification and the correla- 
tions between the variables (in some cases probably due to developmen- 
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tal coupling and ontogenetic links), they do provide evidence to support a 
general conclusion that details of the anatomy of the anterior oral vocal 
tract (see Figures 9 and 10) may indeed have an impact on (particularly 
voiceless) dental fricative production. Such a conclusion that anatomy 
impacts production seems reasonable, taking into consideration that the 
anterior teeth play an integral part of the articulatory system and in 
speech production in general, particularly so in the production of dental 
fricatives. For instance, Narayanan, Alwan, and Haker (1995) observed 
that the tongue tip/blade is more or less always in contact with the lower 
incisors across all speakers during the production of dental fricatives. 
This persistent contact with the lower incisors may indicate bracing of 
some kind (Gick, Allen, Roewer-Després, and Stavness 2017), and it 
could be that having less vertical incisors may affect this “mechanical 
support” and, consequently, impact the successful production of a dental 
fricative. A labial angulation of the upper incisors, too, has been found to 
adversely affect the production of fricatives (Runte et al. 2001). 

Should these anatomical variables hold under replication, we further 
propose a possible mechanistic explanation for the tendency to use alveo- 
lar substitutions, relating to the length and/or forwardness of the lower 
jaw (s/MDepth). The left image of Figure 10 shows an “ideal” anterior 
vocal tract configuration, from which dental fricatives can be produced 
relatively effortlessly with the tongue tip/blade close behind the upper 


M2P M2P 


BTP BTP 
LRP LRP 
Fig. 9. 


Two drawings depicting varying angles of the lower incisor angle (/ow/A). This measures 
the angle (in degrees) formed by the line from LE (incisal edge of the lower incisors) to LR 
(inferior midsagittal alveolar ridge inflection point) and LRP (a plane containing LR, parallel 
to the bite plane, BTP): a larger angle (right) indicates more vertical lower incisors. As the 
angle becomes smaller, there is greater labial tipping of the lower incisor; as the angle 
becomes larger, the incisors become more vertically oriented. The /ow/A variable corre- 
lates moderately with palate length variables (pLen, pmLen, aRDepth, jTotPC1). It is also 
somewhat correlated with alveolar ridge angle (aRA) and upper incisor angle (upper/A), 
such that more vertical lower incisors tend to come with a “bulgier” alveolar ridge (promi- 
nence) and more vertical upper incisors. We might speculate here that there may be 
developmental coupling (or perhaps an ontogenetic link) between lower incisor angulation 
and oral cavity length linking these variables together. 
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Fig. 10. 

Two drawings depicting vary- 
ing distances of the sublingual 
margin depth (s/MDepth). This 
measures the distance (in mm) 
between LR (inferior midsagit- 
tal alveolar ridge inflection 
point) and its projection on 
M2P (paired landmark for the 
central point of the occlusal 
surface of the second molar): a 
larger distance (left) indicates 
a longer mandible. The blue 
dashed lines represent the 
tongue, and the red dashed 
lines show how the tongue 
may be biased towards an 
alveolar place of articulation to 
produce an alveolar stop. 
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M2P M2P 


BTP BTP 


incisors. By contrast, a slightly more retracted/shorter lower jaw (right 
image) may “bias” towards a more posterior place of articulation, from 
dental to alveolar. Additionally, it is also possible that the lower incisors 
do not provide enough clearance for the production of dental fricatives 
(with a dental place of articulation), biasing towards an alveolar stop pro- 
duction. 

Interestingly, Dediu et al. (2019) show, using Canonical Variate Anal- 
ysis (CVA) and Procrustes ANOVA with permutation on a superset of the 
same data used here, that the anatomy of the anterior vocal tract does 
vary between sexes and broad ethno-linguistic groups. Importantly, 
however, this variation is continuous and small, there are lots of overlaps 
and many “miss-classified” participants, and it only becomes apparent 
when using highly multivariate datasets. 

In sum, dental fricatives are rare due to a variety of factors. They are 
difficult to produce in both L1 and L2 acquisition. They are difficult to 
perceive in L2 and in borrowing situations due to their easily confusable 
acoustic properties. Despite the assumption that they should be the com- 
mon result of lenition, very few attested sound changes result in dental 
fricatives (Ktimmel 2007). They are lost frequently and typically merge 
with other phonemes (Kiimmel 2007). Their world-wide synchronic dis- 
tribution suggests that although they may arise relatively easily (e.g., /0/ 
spontaneously replaced /s/ in Spanish in the middle of the 17th century 
(Penny 2002); although more recent research suggests that the dental 
fricatives came into existence in the 1500s (Mackenzie 2022)), they are 
unstable and tend to either be not fully phonologized or quickly lost (e.g., 
they did not spread to Spanish varieties in the Americas). This suggests 
that their phonetic properties play a role in their instability and in why 
they pattern sporadically geographically and genealogically. As pointed 
out above, dental fricatives are easily confused with other sounds, par- 
ticularly non-sibilant labiodental fricatives [f, v] and alveolar or dental 
stops [t, d]. They are rarely borrowed correctly (Grossman et al. 2020) 
and pidgins and creoles almost never inherit them from their lexifiers 
(Michaelis et al. 2013). Finally, our data suggests that tiny, continuous 
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and overlapping patterns of variation in the anatomy of the anterior oral 
vocal tract (Dediu et al. 2019; Dediu and Moisik 2019) may help explain 
their instability and geographic patterning, but it would represent only 
one weak influence among many others and much more evidence is 
needed before drawing causal conclusions of this nature (Blasi et al. 
2019; Dediu et al. 2017; Josserand et al. 2021; Moisik and Dediu 2017). 
Thus, dental fricatives are rare speech sounds for a multitude of reasons 
touching upon their articulation, acoustics and confusability with other 
speech sounds. 
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APPENDIX: LANDMARKS AND ANATOMICAL MEASURES 


Here we present the landmarks and anatomical measures used in this 
study.* 


Landmarks: 
TS = point on the palate roof near the transverse suture and 


corresponding to the plane containing the second molar 
landmarks and orthogonal to BTP; 


PH = peak height of the palate roof; 

UR = © superior (maxillary) midsagittal alveolar ridge inflection 
point; 

LR = inferior (mandibular) midsagittal alveolar ridge inflection 
point; 

UM = _ lingual gingival margin of the upper (maxillary) incisors; 

UE = incisal edge of the upper (maxillary) incisors; 

LE = incisal edge of the lower (mandibular) incisors; 

M2 = paired landmark for the central point of the occlusal surface 
of the second molar (or closest approximation if molar is 
missing). 


Conventions: the bite plane (BTP) was established with reference to 
the second molars in occlusal position and the central incisors judging 
from their sagittal profile. Where mentioned, a regression line refers to 
the line of best fit between the z-components (front-back; anterior-poste- 
rior) and y-components (up-down; superior-inferior) of 3D intraoral scan 
vertices drawn from within 0.5 mm on either side of the midsagittal 
plane. Some measurements are defined with reference to projections of a 
landmark onto a given plane. In addition to BTP, there is also the peak 
height plane (PHP), which refers to the horizontal plane tangent to PH 
and the coronal plane intersecting the second molar landmarks (M2P). 


* Please also see Dediu et al. (2019) and the supplementary materials of (Dediu 


and Moisik 2019) freely available at doi:10.5281/zenodo. 1481941) for more 
details. 
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Anatomical measures: 


Name (and ID) 


palateHeight (pHeight) 


palateWidth (pWid) 


palateLength (pLen) 


palateWidthToHeightRatio (pWHRat) 


palateHeightToLengthRatio (pHLRat) 


heightCanines (cHeight) 


widthCanines (cWidth) 


heightPM2 (pm2Height) 


widthPM2 (pm2Width) 


lengthPM2 (pm2Length) 


heightM2 (m2Height) 


widthM2 (m2Width) 


Ratio cWidth / pm2Width (c.P2w) 


Ratio cWidth / m2Width (c.M2w) 


Ratio cLength / pm2Length (c.P2l) 


overjet (overjet) 


overbite (overbite) 


lowerlncisorAngle (lowlA) 


upperlncisorAngle (uplA) 


alveolarRidgeHeight (aRHeight) 


alveolarRidgeDepth (aRDepth) 
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[This and the next 2 measures describe the hard palate’s linear dimensions] 
Distance (mm) between PH and its projection on BTP: larger means higher palate 


Distance (mm) between the intersections of the lingual surfaces of the second 
molars with the line formed by the M2 landmarks: larger means wider palate 


Distance (mm) between UM and its projection on M2P: larger means longer palate 
pWid / pHeight: larger means wider and flatter palate 


pHeight / pLen: larger means higher and shorter palate 


Height (mm) of the hard palate at the midpoint of the segment connecting the two 
upper canines: smaller means a bigger alveolar ridge, but this depends also on the 
overall shape of the jaws 


Width (mm) at the same point as cHeight: larger means a wider front hard palate 


Height (mm) of the hard palate at the midpoint of the segment connecting the two 
upper second pre-molars: larger means a taller hard palate 


Width (mm) at the same point as pm2Height: larger means a wider hard palate 


Length (mm) of the hard palate from the midpoint of the segment connecting the 
two upper second pre-molars to the lingual gingival margin of the maxillary central 
incisors. 


Height (mm) of the hard palate at the midpoint of the segment connecting the two 
upper second molars: larger means a higher hard palate 

Width (mm) at the same point as m2Height: larger means a wider hard palate 

A continuous measure of “V” (closer to 0) vs “U” (closer to 1) shape of the maxil- 
lary dental arch 


A continuous measure of “V” (closer to 0) vs “U” (closer to 1) shape of the maxil- 
lary dental arch 


A continuous measure of the ratio of palate lengths using the cLength and 
pm2Length variables, which were excluded during pre-thinning to manage multi- 
collinearity; each takes the length between their respective reference points 
(canines and second pre-molars) and the central maxillary incisors (point UM) 


Horizontal distance between the incisal edges of the upper and lower incisors: 
larger tends to be associated with longer upper than lower jaws 


Vertical distance between the incisal edges of the upper and lower incisors: larger 
means lower incisors overlap behind the upper with the jaw in occlusal position 


Angle (degrees) of the lower incisors (regression line from LE to LR): larger means 
more vertical lower incisors 


Angle (degrees) formed by the posterior surface of the upper incisors (regression 
line from UE to UM) and BTP: bigger means more vertical upper incisors 


[This and the following describe the alveolar ridge’s linear dimensions] Distance 
(mm) between UR and its projection on PHP: larger means a bigger alveolar ridge 


Distance (mm) between UR and its projection on M2P: larger means a longer palate 
as gauged at the alveolar ridge point 


cont. > 
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Name (and ID) Definition 


Distance (mm) between LR and its projection on PHP: larger means higher sublin- 


SublingualMarginHeight (sIMHeight) gual mandibular ridge and a tighter anterior mouth space 


Distance (mm) between LR and its projection on M2P: larger means longer man- 
dible 


[This and the following describe the anterior palate area] The area defined by BTP, 
a straight line rising to PH, the line from PH to UM, and a straight line from UM to 
BTP: larger means a more domed and expansive anterior hard palate in absolute 
terms 


sublingualMarginDepth (sIMDepth) 


anteriorPalateArea (antPArea) 


[See immediately above] Ratio formed with reference to rectangular area corre- 
anteriorPalateAreaToRectangleRatio | sponding to the landmarks defining the anterior palate area: larger means a more 
(antPAreaR) domed palate with a steep alveolar ridge and palate transition and a more posterior 

PH location in relative terms 


The area defined by LRP, the straight line formed between UR and its projection on 
LRP, the line from UR to UE, and the projection of UE on LRP minus the area formed 
by the line from LR to LE, the straight line from LE to LRP, and the line formed by the 
intersection of this previous segment to LR: larger means a bigger space under the 
alveolar ridge for the tongue tip in absolute terms but factoring the presence of the 
lower teeth in occlusal position 


availableTongueTipArea (ttAvailA) 


Angle (degrees) between BTP and the alveolar ridge (regression line from UM to 


alveolarRidgeAngle (aA) UR): smaller means a more prominent alveolar ridge (bigger shelf) 


Angle (degrees) between BTP and the post-alveolar slope (regression line from UR 


palateTransitionAngle {pTransA) to PH): larger means a more abruptly rising front palate 


Angle (degrees) of posterior palate defined between the palate roof (regression 


palateRoofAngle (pRoofA) line from TS to PH) and BTP: larger means a more domed palate 


[This and the next 2 measures are obtained from PCA on the entire intraoral scan 
(upper and lower jaws)] First principal component for lower and upper jaws 
(explained variance = 17.6%). Notable positive correlations with pLen, aRDepth, 
slMDepth, and pm2Width (and weakly positive with jLowPC2). Larger values have 
greater antero-posterior scaling and more anterior upper relative to lower jaw 
positioning 


jawTotalPC1 (jTotPC1) 


Second principal component for lower and upper jaws (explained variance = 
11.8%). Notable positive correlations with jLowPC2, uplA, aRDepth, overbite; neg- 

jawTotalPC2 (jTotPC2) ative correlations with ttAvailA and width variables (cWidth, pm2Width, and 
m2Width). Larger values correspond with overall smaller jaws (in all dimensions) 
but also greater overbite and overjet 


Third principal component for lower and upper jaws (explained variance = 6.2%). 
Notable positive correlations with sIMDepth and pRoofA; negatively correlated 
with pWid, pWHRat, overbite, and especially jLowPC3. Larger values have taller 
dentition, with more labially-tipped (less vertical) lower and upper incisors 


jawTotalPC3 (jTotPC3) 


[This and the next measure are obtained from PCA on the lower jaw portion of the 
intraoral scan] Second principal component for lower jaw (explained variance = 
10.1%). Notable positive correlations with aRDepth and especially jTotPC2; neg- 


jawLowerPC2 (jLowPC2) ative correlations with ttAvailA and pWHRat. Larger values indicate shorter, 
smaller, and more narrow (‘U-shaped’) lower jaws with greater overjet and over- 
bite. Smaller values appear to approach zero overjet and overbite 
Third principal component for lower jaw (explained variance = 7.5%). Notable posi- 
jawLowerPC3 (jLowPC3) tive correlations with pWid; negative correlations with especially jTotPC3. Larger 


values show more labially oriented lower dentition with considerably less overbite 
and overjet 
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Other variables: 


Name (and ID) 


Unique identifier (ID) 


English.proficiency (engprof) 


phoneticExperience (phonexp) 
age (age) 
height (height) 


weight (weight) 


Dediu, Lin, Moisik, Moran 


Definition 


The (anonymized) unique participant identifier 


Self-declared level of formal English proficiency 
ona 11-step Likert scale from “none” to “native 
speaker” 


Self-declared level of formal phonetic training 
on a 5-point Likert scale from “none” to “expert” 
Participant's self-declared age at time of study 
Participant's self-declared height (meters) at 
time of study 


Participant's self-declared weight (kilograms) at 
time of study 
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CHAPTER EIGHT 


Rate variation in language change: Toward 
distributional phylogenetic modeling 


Chundra A. Cathcart! 


Abstract 


Since the advent of phylogenetic linguistics, researchers have used a large 
number of phylogenetic comparative methods adapted from computational bio- 
logy to model and analyze the dynamics of change of a wide range of linguistic 
features. Models of this sort vary in complexity; the simplest models of change 
assume homogeneity of transition rates within families, while state-of-the-art 
models of heterotachy allow transition rates to vary across lineages within a 
family. In this contribution, | review a range of applications of biological models of 
rate variation to questions in diachronic linguistics and highlight some models 
from computational biology that have remained largely overlooked by linguists. 
Building off of these and other biological models, | sketch out a program for what 
| term DISTRIBUTIONAL PHYLOGENETIC MODELING, inspired by an analogous 
recently proposed family of hierarchical Bayesian models. | report the results of 
some work in progress carried out within this framework and present a case 
study illustrating the flexibility of the approach. 


INTRODUCTION 


Despite the longstanding recognition of a number of parallels between 
biological and linguistic change, linguistics arguably lags behind biology 
in developing tractable quantitative models capable of testing hypotheses 
regarding the nature of change. While a number of computational biolog- 
ical models can be extended to questions regarding language change, dia- 
chronic linguistics in general has proved hesitant in adopting these 
models wholesale. Some of the qualms involved are justified: despite 
analogs between biological and linguistic evolution, the two fields often 
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have different conceptions of the ways in which the features under study 
change. At the same time, phylogenetic models are increasingly designed 
with linguistic questions in mind, and as historical linguistics becomes 
more technically inclined, specialists in the field can assume a primary 
role in the development of such models. Popular models for categorical 
data types assume that features change according to a continuous-time 
Markov (CTM) process, a stochastic process parameterized by so-called 
transition rates, which characterize not only the speed with which transi- 
tions between feature values occur but also long-term trends towards par- 
ticular values. 

In this contribution, I probe the flexibility of standard biological 
models of transition rate variation with respect to addressing questions in 
linguistics. I provide (perhaps unsurprising) evidence that the biological 
term HETEROTACHY, although explicitly defined as RATE variation in 
a large number of publications—and thus suggesting models which allow 
transition rates to vary across different regimes of change with relative 
freedom—implicitly carries the meaning of variation in SPEED (and not 
in other properties of transition rates) in most of the biological literature 
surveyed. I review a range of applications of these models to questions in 
linguistics and explore the range of questions that such models are 
capable of answering. Additionally, I draw attention to a small number of 
biological methods that model the joint co-evolution of phenotypic and 
genetic traits but have gone largely unnoticed in linguistics, possibly due 
to their difficulty of implementation and a lack of recognition of the par- 
allels between the phenomena described and comparable linguistic ques- 
tions. 

Taking these models as a starting point, I sketch out a framework that 
I term DISTRIBUTIONAL PHYLOGENETIC MODELING, which 
allows different properties of transition rates to vary according to differ- 
ent predictors, across features or lineages. This terminology pays homage 
to the newly developed framework of distributional statistical modeling, 
a family of regression models where both the expected location and dis- 
persion of a response value can vary across predictor values. In a similar 
vein, distributional phylogenetic modeling assesses the effect of different 
predictors on a binary feature’s speed of change (the frequency with 
which transitions occur regardless of the direction of transition), and sta- 
tionary probability (the long-term preference for a particular feature 
value). Some results of ongoing work in this framework are presented; 
additionally, I provide a case study showing how a distributional 
approach can potentially detect the role of areality in change in prosodic 
systems. I conclude by adumbrating additional potential applications for 
this method and discussing some challenges in generalizing this 
approach to non-binary data types. 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


Rate variation in language change: Toward distributional phylogenetic modeling 


RATE VARIATION IN BIOLOGICAL AND LINGUISTIC CHANGE 


A number of biological models account for a phenomenon known as 
HETEROTACHY, derived from Greek takhus ‘swift’ and defined at 
times as speed variation (Lopez et al. 2002) but also in a large number of 
publications as rate variation (e.g., Meade and Pagel 2008: 30). Models 
of heterotachy allow properties of trait evolution to vary according to dif- 
ferent regimes, across features and/or within phylogenies. However, 
from the concise definition given above, there is ambiguity regarding the 
exact quantities that vary across regimes — is there variation in the rate 
of change, i.e., the overall speed with which the system changes, or in 
transition rates more generally, which control not only the speed of the 
system but preferences for individual states? For phylogenetic linguis- 
tics, models which account for a broader understanding of rate variation 
are essential, as both properties of change are of importance to the field, 
as well as the extent to which dynamics of change vary as a function of 
different predictor variables. Below, I assess the extent to which existing 
biological models fulfill these desiderata. 

As far as speed of change is concerned, a large body of work argues 
that linguistic features change at different speeds under different circum- 
stances: for instance, different languages display divergent rates of 
vocabulary replacement (Bergsland and Vogt 1962), pointing to different 
dynamics of lexical change in different phylogenetic lineages. A well- 
known but controversial hypothesis argues that language change is char- 
acterized by regimes of equilibrium, involving slow change, and punctu- 
ation, involving rapid change (Dixon 1997; for a critical appraisal of this 
specific view, see Bowern 2006). Other work, some of it in a phyloge- 
netic framework, links different rates and trends in language change to 
differences in population size, different societal dynamics, and differ- 
ences in social isolation (Nettle 1999; Greenhill et al. 2018). 
Additionally, a large body of research explores the role of large-scale lan- 
guage contact in accelerating language change (McWhorter 2007), par- 
ticularly in extreme cases such as the formation of pidgins and creoles. 

In addition to differences in the speed of change, regimes of language 
change can differ in terms of a preference for a given feature state and are 
thus characterized by different long-term biases toward feature values. 
These preferences may depend on another linguistic feature or one or 
more extra-linguistic features. It is argued, for instance, that labiodental 
sounds like fand v became easier to pronounce following changes to bite 
configuration associated with shifts in subsistence patterns and diet, and 
were thus more likely to be used as speech sounds (Blasi et al. 2019). 
Anatomical research also suggests that the click sounds found in lan- 
guages of southern Africa are a relatively recent response to changes in 
physiology (Dediu et al. 2017, 2021; Moisik and Dediu 2017). It is 
additionally argued that social isolation is linked to the maintenance of 
linguistic complexity, with simplification brought about by adult second- 
language learners (Bentz and Winter 2013; Trudgill 2001). As in the case 
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of phonological inventories, environmental factors have also been 
invoked as an influence on static sound patterns within languages. For 
instance, it has been claimed that languages spoken in drier climates use 
fewer vowels (Everett 2017) and that the same pattern holds for lan- 
guages spoken in colder climates (Maddieson 2018), due to the fact that 
dry air creates articulatory problems for phonation needed to produce 
vowels, and high temperatures degrade the high-frequency spectral infor- 
mation helpful in perceiving consonant clusters. In addition to environ- 
mental factors, genetic factors have been implicated in biases toward lin- 
guistic tone (Dediu 2021). Studies demonstrate robustly that tonal lan- 
guages are spoken in regions of higher humidity (Everett et al. 2015; 
Roberts 2018), due to the fact that lower jitter in fundamental frequency, 
a property of humid environments, makes it easier to stabilize fundamen- 
tal frequency and exapt it for linguistic purposes. Hypotheses that posit 
the direct influence of the environment on sound patterns are highly con- 
troversial (Haynie 2014; Urban 2020), as it is difficult to tease apart the 
correlated influence of environment, areality, and genetics, and it has 
been argued in recent years that hypotheses regarding the influence of the 
environment on sound patterns (Everett 2013) are better analyzed as 
sociolinguistic isolation (Urban and Moran 2021). Regardless of the fac- 
tors involved, a growing body of evidence robustly attests that phyloge- 
netic lineages are characterized by different regimes of change that vary 
both in the speed of change and preferences for individual features. 

The quantitative turn in historical linguistics has seen an increase in 
the application of phylogenetic methodologies to questions in historical 
linguistics. A popular model for the evolution of categorical linguistic 
data assumes that features undergo state changes over a phylogeny 
(usually inferred a priori on the basis of lexical data) according to a con- 
tinuous-time Markov (CTM) process. The transition rates of the process 
can be estimated (usually via Bayesian inference) and estimated values 
can be used to test hypotheses regarding the dynamics of change in the 
features in question. To date, the majority of applications of the CTM 
model in linguistics assume that transition rates between features or pairs 
of features do not vary within phylogenies (Carling and Cathcart 2021; 
Cathcart et al. 2020; Dunn et al. 2017; Haynie and Bowern 2016; Shirtz 
et al. 2021). If data from multiple families are analyzed, phylogenetic 
models are generally fitted separately for each family, with different rates 
for each family (Dediu 2010; Dunn et al. 2011).! Although the rate homo- 
geneity assumption may be overly simplistic, given the evidence for dif- 
ferent rate regimes in language change mentioned above, this assumption 
has a number of advantages: given the relatively small number of param- 
eters in rate-homogeneous models, they are computationally efficient to 


' Note that some of these papers employ the Discrete model (Pagel 1994) and 


related methods (Pagel and Meade 2006), which can be interpreted as models of 
heterotachy; see below. 
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fit, and their restrictiveness may allow for less uncertainty in posterior 
estimates than in more elaborate models. Additionally, questions asked in 
the work cited here are in no way related to rate heterogeneity and are 
more concerned about exploring the long-term dynamics of feature 
change; recent work even finds support for rate homogeneity across 
rather than within families (Jager and Wahle 2021). At the same time, rate 
heterogeneity, both across features and lineages, is central to a number of 
outstanding questions in linguistics, including those raised at the begin- 
ning of this section. 

A number of well-known models of heterotachy account for variation 
in speed across evolving features as well as within phylogenies. Models 
accounting for within-phylogeny variation can be subdivided into 
models which assume variation to be fixed at the branch level, and those 
that allow multiple rate regimes to be visited on a single branch of a phy- 
logeny. The latter type is best represented by the COVARION model, a 
prominent way of representing heterotachy (Fitch 1971; Tuffley and 
Steel 1998; Wang et al. 2007). A basic covarion model for binary data 
assumes a four-state CTM process with “hot” and “cold” regions charac- 
terized by normal and slow or nonexistent change. The system can transi- 
tion from hot to cold regions or vice versa but change between presence 
and absence is more frequent in hot regions than in cold ones. The hidden 
rates model is a generalization of the covarion model that allows gain and 
loss rates to differ across rate regimes in a less constrained manner, not 
solely according to speed, and additionally can accommodate more than 
two rate classes (Beaulieu and O’Meara 2014). 

Other alternatives to the covarion model assume that speed variation 
is fixed at the branch level (Heath et al. 2011; Pagel and Meade 2008). 
Unlike the covarion model, within-branch transitions between rate 
classes are not permitted.” At the same time, branch-level models are 
more flexible than covarion models in that it is straightforward to 
account for a greater number of rate regimes to an extent that would be 
computationally costly under a covarion model, as it would involve a 
high-dimensional rate matrix (Irvahn and Minin 2014). The models cited 
above allow variation in speed, but not necessarily in branch-level trends 
toward some feature value. Limiting variation to speed enables some 
computational tricks, as differences in speed can be modeled at the 
branch level by directly manipulating the branch length rather than the 
rates of the CTM process. 

Similar methods are used to account for variation in rates across fea- 
tures. Huelsenbeck and Suchard (2007) allow the speed of change of dif- 
ferent features to vary across speed classes by manipulating the total tree 
length across rate classes. A notable linguistic study allowing speeds to 


2 A reviewer notes that this distinction is somewhat trivial, as within-branch state 
transitions under a covarion model account for the cumulative expected amount 
of change on the branch as a whole. 
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vary across branches and features is that of Greenhill et al. (2017), which 
analyzes change in lexical and morphosyntactic features in Austronesian, 
allowing speed variation across branches as well as features (the authors 
coestimate speeds of change along with language phylogenies, rather 
than assuming the phylogeny a priori). 

Ultimately, an overwhelming number of off-the-shelf models of het- 
erotachy account solely for variation in speeds of change. One obvious 
reason is the fact that methods of this sort are designed for modeling 
change in DNA sequences, where in most circumstances there are no 
clear biases towards particular bases (though see below). Another issue is 
the fact that models of this sort rely on Markov chain Monte Carlo 
(MCMC) methods, which must satisfy certain constraints in order to effi- 
ciently explore parameter space. Many MCMC methods suffer when 
exploring high-dimensional posteriors; hence, simpler models are easier 
to tune such that proposals are efficient. Models of this sort are readily 
available to linguists interested in speed variation, but not necessarily in 
other dynamics of change. Furthermore, these methods do not directly 
model the role of different linguistic and extra-linguistic predictors in 
explaining variation in the dynamics of change. One model capable of 
doing this is the Discrete model of correlated evolution (Pagel 1994), 
under which gain and loss rates for one feature vary freely according to 
the presence or absence of another feature; this model requires a binar- 
ized representation of both features. 

Despite the fact that the most accessible models of heterotachy model 
only variation in speed and are thus not expressive enough to capture the 
full range of phenomena of interest to phylogenetic typology, other 
models, albeit less well-known ones, may be relevant to the needs of the 
subfield. There exist a number of biological models which jointly charac- 
terize the evolution of continuous phenotypic and discrete genetic traits 
across lineages within a phylogeny (see Bromham 2009; Bromham et al. 
1996 on some relevant phenomena). These are overlooked in the phy- 
logenetic linguistics literature, because at first blush there is no clear con- 
nection between the biological phenomena they capture and linguistic 
processes we may wish to model. Lartillot and Poujol (2011) model cor- 
relations between several phenotypic variables (maturity, mass, and lon- 
gevity) and the ratio of nonsynonymous (i.e., transitions between nucleo- 
tides that alter the amino acid sequence of a protein) to synonymous sub- 
stitutions. A handful of papers (Horvilleur and Lartillot 2014; Lartillot 
2013) deal with so-called GC equilibrium or GC-biased gene conversion, 
a phenomenon where repairs to certain DNA mismatches may favor 
strong bases (G and C) over their weak counterparts (A and T), and pos- 
sible extra-genetic correlates or determinants thereof. While it is chal- 
lenging to find a direct analog in linguistics for the biological phenomena 
to which these methods are applied, the importance of models of this sort 
is clear: they provide a means of representing relationships between co- 
evolving continuous and discrete features off of which phylogenetic lin- 
guistic methods can build in order to explore how changes in a continu- 
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ous extra-linguistic feature influences long-term preferences for values 
of a discrete linguistic feature, as well as its speed of change (although 
these models do not explicitly address speed) (Table 1). 


Speed variation Transition rate variation 


Huelsenbeck and Suchard 
Predictors not modeled 2007; Pagel and Meade 2008; | Beaulieu and O'Meara 2014 
Tuffley and Steel 1998, etc. 


Predictors/correlates Lartillot and Poujol 2011; 
modeled Pagel 1994, etc. 


Markov chain Monte Carlo (MCMC) implementations of the models 
described above are not straightforward to modify by researchers wish- 
ing to apply the models to non-biological data. Fortunately, specialists 
lacking the expertise to tune and modify MCMC algorithms for the pur- 
pose of efficient sampling can make use of an increasing set of offerings 
in the domain of probabilistic programming languages which require 
only a specification of the generative process thought to underlie the 
data. One such language is Stan (Carpenter et al. 2017), which uses an 
adaptive version of Hamiltonian Monte Carlo, a gradient-based method 
that avoids the random walk behavior of MCMC approaches, making it 
possible to infer larger numbers of parameters and employ flexible prior 
distributions over parameters. Recent sophisticated phylogenetic models 
have been implemented in Stan, including phylogenetic causal modeling 
(Ringen et al. 2021). Programming languages like Stan can be used to 
recast some of the models described above, in order to make them more 
flexible, with some limitations. A salient limitation of gradient-based 
probabilistic programming languages like Stan is that discrete parame- 
ters cannot be directly sampled, and must be marginalized, which is 
unfeasible under some circumstances. This rules out the possibility of 
modeling branch-specific rate classes, which involves an exponentially 
increasing enumeration of configurations of rate class membership 
across a phylogeny, and makes modeling feature-specific rate classes 
computationally costly for large numbers of features and rate classes. In 
what follows, we move away from the notion of discrete rate classes in 
favor of approaches that allow speeds and preferences to vary at both the 
branch level as a function of one or more predictors. 


DISTRIBUTIONAL MODELING 


In this section, I motivate a method for assessing the effect of different 
predictors on multiple components of language change for binary lin- 
guistic features. Specifically, this method decouples transition rates into 
the overall speed of change, i.e., the scale of the change rate between fea- 
ture states irrespective of the direction of change, and the stationary prob- 
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Fig. 1. 

Binary continuous-time 
Markov process representing 
changes between presence 
and absence of a feature. 
Transitions are annotated 
according to the gain rate (a) 
and loss rate (£) of the feature, 
with alternative parameteriza- 
tions according to speed of 
change (s) and stationary prob- 
ability of feature presence (z) 
provided as well. 


Fig. 2. 

Simulated CTM processes 
showing transitions between 
states of a binary feature 
under different speeds (s € 
{5,10}) and stationary probabil- 
ities (z € {0.1,0.9}). 
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B=(1—n)s 


ability of feature presence, interpretable as the long-term preference for 
a given feature. 

Under the standard view of a continuous-time Markov process for 
binary data, a feature arises and is lost according to a gain rate and a loss 
rate. Assuming a speed of change s and stationary probability z, the gain 
rate and loss rate can be rewritten as sz and s(1 — m), respectively (see 
Fig. 1). This is the binary case of a general time-reversible model (Tavaré 
1986), which parameterizes changes between multiple states in a con- 
tinuous-time Markov chain according to stationary probabilities of state 
presence and exchange rates (identical to what I term the speed of 
change) between each pair of states. Simulated trajectories of change 
under binary CTM processes with different speeds and stationary prob- 
abilities are found in Figure 2. 


rosy 
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Allowing both the speed of change and stationary probability of a fea- 
ture to vary according to one or more predictors invites analogies with a 
recently proposed innovation in hierarchical Bayesian modeling, namely 
so-called DISTRIBUTIONAL MODELS, which allow both the location 
and scale of a regression model to vary as a function of predictor vari- 
ables (Biirkner 2017), thus relaxing a number of assumptions found in 
classical linear regression, such as homoskedasticity. In the same vein, 
distributional phylogenetic modeling can allow us to understand which 
properties of change vary according to different predictor variables, both 


WORDS, BONES, GENES, TOOLS: DFG CENTER FOR ADVANCED STUDIES 


Rate variation in language change: Toward distributional phylogenetic modeling 


within and across features. A more refined understanding of whether cer- 
tain extralinguistic predictors of change affect speed versus biases 
toward a particular feature value can be crucial to our understanding of 
language change. For instance, it could be the case that what appears to 
be the result of increases in complexity in regions of greater social isola- 
tion is simply an artifact of more rapid change in regions of less isolation. 
Decoupling these components of change provides a principled method 
for investigating hypotheses of this sort, both across lineages as well as 
linguistic features. 

In what follows, I illustrate the results of work that investigates the 
role of different predictors in explaining variation in speed of change and 
stationary probability of presence at the featural level, using a large lin- 
guistic data set. Subsequently, I present the results of a case study that 
models differences in speed at the branch level as a function of an extra- 
linguistic feature, namely geospatial dynamics. 


CASE STUDY: ROMANCE VERBAL MORPHOLOGY 


Cathcart et al. (2022) conduct a study in a distributional phylogenetic 
framework of the evolution of stem alternations in Romance verbs. Stem 
alternations in Romance verb paradigms are of particular interest to dia- 
chronic linguistics (Esher 2016; Herce 2019; Maiden 2018, etc.). 
Romance verbal paradigms often exhibit so-called morphomic patterns 
constituting stem allomorphy that is neither phonologically nor semanti- 
cally motivated. Despite their irregularity, these patterns are highly stable 
and are frequently extended to new verbs. The philological literature 
identifies three main types of stem alternations in Romance verbs, 
labeled N, L, and P(YTA); these can co-occur within verbal paradigms. 
The emergence of the N and L patterns occurred as a result of sound 
changes after Classical Latin but (largely) before the break-up of 
Romance into different languages. Unlike the other two patterns, the P 
pattern is inherited from Latin, stemming from a semantic distinction that 
is no longer present in modern Romance languages, leading to alterna- 
tions that are arbitrary from the perspective of meaning. 

Data from the Oxford Online Database of Romance Verb Morphol- 
ogy (Beniamine et al. 2020; Maiden et al. 2010) were manually coded 
according to whether or not they exhibited each of the three pattern types 
(which can co-occur within individual paradigms), yielding three possi- 
ble lemma-pattern pairs per lemma in each language. In total, the data 
analyzed comprised 171 lemma-pattern pairs involving 66 lemmas from 
67 Romance speech varieties (lemma-pattern pairs exhibiting no varia- 
tion between the states present and absent were excluded). A Romance 
phylogeny was used to carry out distributional phylogenetic modeling, 
inferred using RevBayes (Héhna et al. 2016) on the basis of both auto- 
matically generated lexical cognacy data (Jager 2018) and sound class 
data indicating speech sounds that are present in each variety Heggarty 
et al. (2019). 
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As morphomic pattern types can co-occur within the same lexeme’s 
verbal paradigm, each lemma-pattern pair is assumed to evolve inde- 
pendently according to a binary CTM process between the states PRES- 
ENT and ABSENT. The gain and loss rate of a lemma-pattern pair with 
index d € {1,...,D} are zgsp and (1 — zq)sqp. Here, sgp represents the 
speed of change for feature d, sg being a multiplier of the global speed p. 
The global speed p ~ Uniform(0,10) represents the global speed of 
change, which varies from feature to feature according to the multiplier 
Sq, and prevents changes from happening more frequently than once per 
century. The parameter zg is the stationary probability for the lemma-pat- 
tern pair in question. For each feature, the likelihood P@glsgagp,¥) can 
be computed using Felsenstein’s pruning algorithm (Felsenstein 1981, 
2004), where xd is a vector of values indicating the presence or absence 
of a given lemma-pattern pair in the languages in the data set, and is a 
phylogeny. Both the speed of change and stationary probability for each 
lemma-pattern pair can be modeled as a function of multiple predictors, 
making it possible to assess the effect of different factors on both speed 
and stationary probability. Both s and z are logit-normally distributed, as 
follows: 


: s,LEMMA s,PATTERN 
(1) Jogitsg ~ Normal(a’ + Biruma Da T P PATTERN IDy 5) 


. 1,LEMMA 1,PATTERN 
(2) logittg ~ Normala" + Primma IDa T P PATTERN IDa? o”) 


In each sampling statement, a represents an intercept, PMA repre- 
sents the contribution of each lemma type, and PATTERN represents the 
contribution of each alternation type. The contribution of lemma type is 
modeled as a monotonic function (Bürkner and Charpentier 2020) of 
each lemma’s frequency in Latin texts. Pattern type is dummy-coded, 
modeling comparisons of the levels L and P to N, respectively. Nor- 
mal(0,1) priors are placed over all model parameters in statements (1—2) 
with the exception of simplex parameters and standard deviations o, 
which receive Dirichlet(1,...,1) and HalfNormal(0,1) priors, respectively. 
Posterior distributions for parameters are inferred using the R package 
CmdStanR (Gabry and Češnovar 2021). The resulting posterior 
coefficients” for model predictors (given in Fig. 4) serve to clarify some 
aspects of morphological change that were previously poorly understood 
or underappreciated. Interestingly, none of the predictors have a decisive 
effect on the speed of change of a lemma-pattern pair, as 95% credible 
intervals for these parameters all overlap with zero. However, frequency 
has a decisive effect on the stationary probability of pattern presence 
(Br1EMMA)_ This result is interesting in light of a large body of research 
that links frequency of usage to speed of change in vocabulary replace- 
ment (Pagel et al. 2007, though see also Wilson et al. 2019). In the case of 
Romance verbal paradigms, lexical frequency does not explain variation 
in speed of change, but in other properties of change. A tentative interpre- 
tation of this result is that the loss versus maintenance of irregularity in 
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less versus more frequent forms is not an accident brought about by the 
instability of less frequent forms, but rather reflects the evolutionary 
advantage of maintaining irregular patterns in highly frequent verbs. The 
distributional model developed can be straightforwardly extended to 
model branch-level trends in change (pertaining to both speed and prefer- 
ences for general irregularity or specific irregular patterns) and can incor- 
porate a wide range of predictors. 


CASE STUDY: STRESS SYSTEMS OF THE WORLD'S LANGUAGES 


In this section, I present the results of a case study designed to demon- 
strate the flexibility of a distributional approach in assessing the role of 
different predictors across multiple features and lineages. Importantly, 
the model I construct probes the influence of geospatial factors on across- 
lineage trends in change. Though highly preliminary at this stage, this 
work further advances phylogenetic linguistics in the direction of 
accounting for horizontal as well as vertical pressures in language 
change. 

I apply the model developed to the question of change in stress sys- 
tems. Languages differ widely in terms of the suprasegmental systems 
they exhibit. Language change can involve drastic transitions between 
types of prosodic behavior. For instance, Old Latin had fixed stress on 
initial syllables, whereas Classical Latin developed stress on penultimate 
or antepenultimate syllables, depending on vowel length (Penney et al. 
2011); additionally, Old Chinese is believed to have lacked tone, yet lex- 
ical tone developed in later Chinese varieties via tonogenesis (Baxter 
1992). Changes in prosodic systems are well studied. In particular, the 
emergence of tone is linked to voicing and other acoustic properties that 
may be enhanced by environmental and genetic factors. At the same time, 
many aspects of change in stress systems are not fully understood. While 
undoubtedly many factors are involved, language contact is frequently 
invoked as a source of prosodic change (Pronk 2018; Rice 2014). For 
instance, the presence of initial stress among genetically distantly related 
languages of Central Europe such as Hungarian and Czech is highly con- 
spicuous, and usually attributed to contact. (Fig. 3) 


pi teen —$(T ae — = 
BEPATTERN O e O 
pPATTERN O BR PATTERN O 
1 0 1 2 3 1 0 1 2 


As a means of assessing the role of geography in linguistic change, I 
jointly model the phylogeographic diffusion of language families along 
with a distributional CTM model of the evolution of features pertaining 
to prosody. I assume a relaxed random walk model (RRW) of phylogeog- 
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raphy, under which geospatial diffusion takes place according to a pro- 
cess of Brownian motion, the scale of which exhibits branch-level varia- 
tion. This model and some extensions serve as the standard for modeling 
linguistic migration (Gill et al. 2017; Lemey et al. 2010), and are shown 
to accurately capture properties of a language family’s spread when the 
process of spread involves expansion from a given point of origin, but not 
necessarily when wholesale migration from the point of origin has taken 
place (Neureiter et al. 2021); accordingly, the RRW and its extensions 
may not be appropriate for all of the world’s language families. The basic 
RRW employed in this paper is not sensitive to environmental features in 
the way that more sophisticated models are (Bouckaert et al. 2012, 2018; 
Koile et al. 2022). 

I take branch-level diffusion rates as our key geospatial parameter of 
interest. These parameters model fluctuation in the speed of migration of 
different phylogenetic lineages. More rapid migration on the part of a 
speech community has the potential to bring speakers into contact with 
speakers of other languages, increasing the possibility of language shift 
among adults and rapid changes in typological profile. If this view is 
accurate, higher rates of geospatial diffusion should coincide with faster 
speeds of featural change on the same branch. This hypothesis — that the 
speed of geospatial diffusion has a positive effect on the speed of linguis- 
tic change — is perhaps simplistic, but the investigation carried out here 
opens the door for more nuanced studies of this broader question. 

I use data from chapters 14-15 (Fixed Stress Locations and Weight- 
Sensitive Stress; Goedemans and van der Hulst 2013a, 2013b) of the 
World Atlas of Linguistic Structures (Dryer and Haspelmath 2013). The 
data were recoded into 12 binary traits (Antepenultimate, Initial, Penulti- 
mate, Second, Third, Ultimate, Left Edge, Left Oriented, Not Predict- 
able, Right Edge, Right Oriented, Unbounded), removing redundant fea- 
tures logically dependent on other feature values. In theory, multiple 
stress types can co-occur within languages in different lexical strata, 
making a binary data type an appropriate representation for these feature 
values. Whether or not all features can be absent in a given language 
(e.g., in sign languages, in certain tone languages) gets further into the 
domain of theoretical analyses outside the scope of this paper (Duanmu 
2004; Hyman 2006); this behavior would be a potentially unintended 
consequence of a binary data type. 

To prepare the data for phylogenetic analysis, I used the workflow 
designed by Jager and Wahle (2021). I merged each glottocode in the 
recoded WALS data with one or more corresponding taxa in the ASJP 
dataset (Wichmann et al. 2018). This expanded the 485 languages in the 
WALS sample into 779 ASJP taxa for which stress data are available. I 
inferred phylogenies for the families in the augmented data set. In sum, 
the augmented data contained taxa from 57 families, of which 13 con- 
tained only two members, as well as 55 taxa that were the sole represen- 
tatives of their family. I limited my analyses to 698 languages from 44 
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families comprising more than 2 members. Data and code can be found at 
https://github.com/chundrac/phylogeneticTypology. 

I fit three models of increasing complexity. The first of these 
(MODEL) assumes that each feature in the data set evolves in each phy- 
logeny according to a global change rate and stationary probability. The 
second of these (MODEL-FAM) allows change rates and stationary 
probabilities for features to vary across families. The third of these 
(MODEL-FAM-GEO) builds upon MODEL-FAM, combining an RRW 
model of phylogeography with a CTM model of character evolution in 
order to detect fluctuation in the speed of change that may be explained 
by variability in rates of phylogeographic diffusion at the branch level. 
Under a time-homogeneous Brownian motion, displacement in a trait 
value between times s and f, x; — x, is normally distributed with a mean 
of 0 and a variance of o(t — s), where ø is a parameter controlling the 
overall scale of diffusion. The RRW model multiplies o by a branch-spe- 
cific scale pp for a branch with index b, which allows for faster or slower 
displacement on different branches. This model allows change rates and 
stationary probabilities for features to vary across families, and also 
allows branch-level speeds of change to vary as a function of the phylo- 
geographic diffusion rates of corresponding branches. A detailed model 
specification can be found in the Appendix. 

Model comparison was carried out via Pareto-smoothed importance 
sampling leave-one-out (PSIS-LOO) cross-validation (Vehtari et al. 
2017). Posterior samples (including log-likelihood values for each sam- 
ple) were aggregated across all 10 posterior distributions. Differences in 
expected log predictive density (ELPD) values across models were cal- 
culated using the function loo compare in the R package loo (Vehtari et 
al. 2020). Below, we see differences in ELPD between each model and 
the model with the largest ELPD (the model in the first row), along with 
standard errors of the differences. In general, if the absolute difference is 
greater than two standard errors, the model with the higher ELPD is a 
decisively better fit to the data. I also carry out model stacking (Yao et al. 
2017), which averages predictive distributions of different models to 
generate weights representing their relative predictive power; weights 
are provided below. 


Model AELPD SE Weight 
MODEL-FAM-GEO |0 0 1 
MODEL-FAM -43.1 9 0 
MODEL | -66.4 16 0 


The diffusion-sensitive model is the best fit for the data, followed by 
the model with family-level rate variation. There is little support for the 
idea that universal trends alone account well for the variation in the data. 
A question arises as to why there is support for family-specific rate vari- 
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ation in this paper’s models, but not in the results of Jager and Wahle 
(2021), who analyze Greenbergian word order correlations. A reason for 
this may be that models of the evolution of interdependent features cap- 
ture global dynamics rooted in cognitive pressures, which hold across 
families. Here, we analyze the independent evolution of specific prosodic 
traits, which are preferred to different degrees in different phylogenies 
and linguistic areas, and are thus better characterized by evolutionary 
dynamics that vary across families. 

For MODEL-FAM-GEO, I inspect the posterior distribution of the 
parameter BGRQ)’, which controls the effect of branch-level phylogeo- 
graphic diffusion rates on branch-level speeds of change (see Fig. 4). 
100% of samples are greater than zero, indicating decisive evidence for a 
positive effect of phylogeographic diffusion rate on the speed of linguis- 
tic change. This indicates that branches with faster speeds of migration 
tend to exhibit higher overall speeds of change for the linguistic features 
analyzed here. This result can be interpreted as evidence that change in 
prosodic features shows sensitivity to dynamics of geospatial change, 
indirectly pointing to the role of contact in language change. 

This result is intriguing, although it should be evaluated with care. 
Conclusive acceptance of this result will hinge on careful model criticism 
in order to ensure that this result is not an artifact of some properties of 


Posterior density 
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the model that are incorrect. An important step, outside of the scope of 
this paper, is to inspect distributions of diffusion rates at the individual 
branch level to ensure that patterns dovetail with extra-linguistic infor- 
mation regarding individual languages’ dispersal. A preliminary inspec- 
tion of the phylogeographic parameters, specifically, the inferred loca- 
tions for proto-homelands (Fig. 5) of the different families in the sample, 
indicates some shortcomings and directions for improvement. Many of 
the inferred homeland locations do not reflect consensus views found in 
the literature: to mention only a few, the Indo-European homeland is 
inferred to be in Central Europe, which is not a serious candidate for the 
Indo-European Urheimat; the Austronesian homeland is inferred to be in 
Indonesia rather than Taiwan; the Turkic homeland is located further to 
the west than the traditional view holds it to be. It appears to be the case 
that inferred homeland locations are highly sensitive to biases in the lin- 
guistic sample: non-European Indo-European languages are underrepre- 
sented, as are Formosan Austronesian languages. Ancient and medieval 
languages that can serve to produce more plausible estimates of home- 
land locations are also missing from this data set. This issue can be alle- 
viated in a number of ways. One approach might involve imposing rel- 
atively informative priors over homeland locations, based on proposals 
in the literature; this would be particularly helpful in situations where 


Atlant 
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migration of speech communities rather than expansion has taken place. 
Another approach would be to estimate phylogenetic parameters on the 
basis of a larger tree than the subtrees for which typological data are 
available while estimating parameters of the CTM model on the basis of 
a smaller tree. It is worth noting that none of the trees used here are cali- 
brated to produce realistically timed branch lengths, so the use of pub- 
lished trees with more accurate chronologies will aid this process as well 
(there is the risk that unrealistically shallow or deep chronologies for lan- 
guage phylogenies will respectively overestimate and underestimate 
phylogeographic diffusion rates, as the model will take languages to have 
undergone more or less migration over time than expected). Fitting 
family-level, as well as global and branch-level diffusion rate multipliers, 
might help to account for variation in chronologies among phylogenies 
that stems from modeling assumptions (cf. Chang et al. 2015). At the 
same time if different families have in fact undergone geographic dis- 
persal at different overall rates, then fitting family-level rate multipliers 
might suppress meaningful effects of diffusion rate on linguistic change. 

Yet another issue is the fact that for this paper, phylogeographic 
parameters and CTM parameters were co-inferred, with posterior param- 
eters inferred on the basis of the joint distribution of the phylogeographic 
and CTM parameters. This has the potential to produce different results 
than a procedure in which phylogeographic parameters are first estimated 
on their own, with CTM parameters subsequently estimated on the basis 
of phylogeographic distributions (employing some form of measurement 
error). Just as phylogenetic comparative methods tend to treat the phy- 
logenetic representation of taxa under study as a given, not to be coesti- 
mated with evolutionary dynamics of the data under study, so too might 
we wish to treat phylogeographic distributions as given quantities on the 
basis of which we wish to condition our models. 

These issues aside, this case study serves as an important proof of 
concept for the integration of phylogeographic models and CTM models 
of linguistic evolution. Related approaches can investigate more direct 
questions regarding the role of geography in language change. Here, we 
looked only at the effect of diffusion rate on the speed of feature change, 
but this variable’s role in shaping trends toward simplification could be 
investigated as well (cf. Jing et al. 2022). Additionally, inferred longitude 
and latitude values for internal nodes of phylogenies in a sample can be 
incorporated into a CTM model of character evolution. These values 
could be used to detect spatial autocorrelation in branch-level fluctuation 
in speed or preferences for particular features in a data set. Under most 
circumstances, Gaussian Processes (Rasmussen and Williams 2006) are 
ideal for modeling spatial autocorrelation, given their flexibility. At the 
same time, they are computationally costly for large numbers of data 
points, as the covariance between each pair of data points must be com- 
puted. For phylogenetic samples like this paper’s, there are too many 
branches for this to be computationally feasible. A recently developed 
method approximates draws from a Gaussian Process using basis func- 
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tions (Riutort-Mayol et al. 2020), which lowers the computational cost 
for large data sets. At the same time, this method requires several non- 
trivial decisions on the part of the user, such as the choice of the number 
of basis functions. A Gaussian Process-based approach may be more 
appropriate for targeted studies involving smaller numbers of phy- 
logenies comprising languages spoken in the vicinity of each other. 


OUTLOOK 


In the previous sections, I introduced distributional phylogenetic model- 
ing, an approach that takes as its inspiration advances in hierarchical 
Bayesian modeling and builds off of biological models of the co-evolu- 
tion of continuous and discrete traits. I showed how these models can be 
used to analyze rate variation across features as well as phylogenetic lin- 
eages, and demonstrated that by decoupling speed of change and station- 
ary probability of feature presence, we can uncover the effects of 
different predictor variables on different components of change. It is 
hoped that this framework will be useful in further advancing our under- 
standing of the relationship between different extralinguistic and linguis- 
tic variables, which are much discussed in the literature but generally 
analyzed in a regression framework, which does not explicitly model the 
diachronic dynamics of these relationships. For instance, if we are will- 
ing to assume that a language’s altitude, often taken as a proxy for social 
isolation (Nichols and Bentz 2019), evolves according to a stochastic 
process such as Brownian motion or variants thereof, we can assess 
whether decreases in altitude have an effect on linguistic complexity, as 
operationalized by some discrete feature. For other extra-linguistic fea- 
tures, such as population size and environmental data, simple models like 
Brownian motion are most likely not appropriate and may have to make 
use of historical data in order to make accurate estimates (e.g., Huebner 
2020). 

An issue of concern is the fact that the models presented here involve 
binary data, which can be straightforwardly parameterized according to 
the speed and stationary probability of the CTM process according to 
which the data are assumed to evolve. Extending this approach to non- 
binary data types requires some serious thought. The General Time- 
Reversible model (Tavaré 1986) explicitly models CTM processes for 
non-binary data according to the stationary probabilities of each state as 
well as exchange rates between each pair of states, representing the rate 
of change between each pair of states irrespective of its direction. Given 
this setup, it is straightforward to allow both speed and stationary prob- 
ability to vary independently across rate regimes. At the same time, for 
K states, the GTR model contains K + ““— free parameters, in compar- 
ison to a CTM process involving independent rates between each pair of 
states, which would have K(K — 1) parameters. It is therefore possible 
that the GTR model is too restrictive to capture certain phenomena. This 
is to say nothing of the difficulties that may arise in interpreting the 
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effects of predictors on different components of a more complex process 
of change. 

Ultimately, while much ground has been broken in developing flex- 
ible models designed expressly for phylolinguistics, many tasks remain 
in fully understanding the diachronic pressures that shape synchronic lin- 
guistic distributions. It is hoped that with an increase in flexible 
approaches, we will move closer to this goal. 
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APPENDIX 


Model Specification 


Below, D denotes the number of features in the data set, and J the number of phylogenies in 
the sample. Together, all of the phylogenies in the sample contain B branches. The 
phylogenetic ID of a given branch b € {1,..., B} is denoted by ġ,. 


MODEL 


For all branches in all phylogenies, the gain rate of feature d E {1,...,D}is equal to mgsgt 
and the loss rate is (1 — 1g)sgt. The parameters mt and S indicate stationary probabilities 
and speed multipliers, and are both constrained to keep values between 0 and 1; T is a global 
speed of change. These parameters are further defined below: 


logitsg £ aj 
logitta * ag 
ag aq ~ Normal(0,1) 
t ~ Uniform(0,10) 


The likelihood of each feature in each phylogeny under these parameters can be computed 
using Felsenstein’s pruning algorithm (Felsenstein 1981, 2004). I assume a uniform prior 
probability of each feature state at the root of each tree. The model (along with all models 
described in this section) was fitted separately on 10 trees from the tree sample using RStan 
(Stan Development Team 2019). 


MODEL-FAM 


For an individual branch with index b € {1,.., B}, if $, = j, the gain rate of feature d is equal 
to Ta jSa, jT and the loss rate is (1 — T4 j)Sa,jT. These parameters are further defined below: 


logitsa,; = aq + Bo; 
logittaj = ag + BS; 

ag bod E {LJ}, aa Bo jij E€ {1... J} ~ Normal(0,1) 
t ~ Uniform(0,10) 


cont. — 
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MODEL-FAM-GEO 


Under Brownian motion and related diffusion processes, the latitude and longitude (written 
£ for brevity) at the tips ofa phylogeny follow a multivariate normal distribution (cf. O’Meara 
et al. 2006): 


£ ~ MultiNormal£",Z) 


Above, £ is the vector of observed values at the tips of the tree, and £" is a vector the same 
length as #t which repeats the value at the root, which is unobserved; Z represents the 
phylogenetic covariance between tips in the phylogeny. Under time-homogeneous Brownian 


motion, the phylogenetic covariance between languages i and j, Zij, is equal to the sum of the 
lengths of the branches leading from the root of phylogeny to the most recent common 
ancestor (MRCA) of the two languages, multiplied by a parameter o representing the scale 
of the diffusion process. Under a relaxed random walk model, Z;j is equal to OP" Gurea; 


where ø is the global scale of the process’ diffusion, p is a vector of scale multipliers for each 
branch in the phylogeny and Çmrca;; iS a vector that contains the lengths of the branches that 


intervene between the root of the phylogeny and the most recent common ancestor of nodes 
i and j, with zeroes corresponding to branches that do not intervene. 


I assume that for each phylogeny, /on‘ and Jat are distributed as above, with the scale 
parameters o and p shared across both dimensions of diffusion. For simplicity, I do not 
model correlation between the two dimensions, and do not account for measurement error 
in longitude and latitude values recorded for languages. 


For an individual branch with index b € {1,.., B}, if $, = j, the gain rate of feature d is equal 
to mq jSa jT and the loss rate is (1 — 7g ;)Sq,;t. These parameters are further defined below: 


logitsa j = ag BS, + BézoPn 
logit; = aq + Bj 
ag Bop j E {L.-J aa Boy E {1 ..- J}, Bero ~ Normal(0,1) 
t ~ Uniform(0,10) 
0, Ppp: b € {1,...,B} ~ Gamma(1,1) 
Jon;:j € {1,...,J} ~ Uniform(—180,180) 
Jat;:j € {1,...,J} ~ Uniform(—90,90) 
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CHAPTER NINE 


Consequences of reflexivity in language 


N. J. Enfield! and Jack Sidnel?’ 


Abstract 


Language is widely held to underpin cumulative technology and social institu- 
tions. We argue that central to this power of language is one under-acknowl- 
edged feature: namely, the reflexivity of language. Language can be used to refer 
to itself. We first define reflexivity in language, and explicate some of the aspects 
of language that are made possible by it, including names, reported speech, par- 
aphrase, tense, and pronouns. We then argue that the reflexive property of lan- 
guage has had at least three revolutionary consequences for our species: first, 
reflexivity enables quoted speech, crucial for reach and reputation management; 
second, reflexivity enables the building of texts, such as the narratives that build 
common sense-making as well as legal texts that create social realities; third, 
reflexivity of language enables social accountability, which is indispensable for 
the creation of social realities—anything grounded in rights and duties, from 
ownership to political authority. In the final section, we discuss an apparent par- 
adox arising from the claim that metalanguage is a prerequisite for language, and 
we speculate that practices of repair in interaction (e.g., saying “Huh?” when one 
person hasn't understood what another is doing) may constitute a path by which 
metalanguage precedes language. 


Consider two remarkable features of humanity. 


The first one is technology. Technological innovations have been central 
to our species’ staggering series of transformations; not just starting in 
the eighteenth century with the spinning jenny or the steam engine, but 
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further back in time with the printing press, gunpowder, the levered 
crane, and the outrigger canoe. While tool use is observed in animals 
(Shumaker, Walkup, and Beck 2011), technology is uniquely human. 
Why? Because it requires a way to achieve the ratchet effect that makes 
cumulative culture possible (Tomasello, Kruger, and Ratner 1993). Each 
new technology incorporates previous ones and builds on them in flex- 
ible, adaptive, and open-ended ways (Arthur 2009). A technology will 
incorporate natural forces and affordances and encode the intentions of 
both its designer and the deep lineage of designers who contributed to its 
development over centuries and millennia. 

The second remarkable feature of humanity is social institutions. 
Humans have highly diverse ways of coordinating behavior with refer- 
ence to norms. The culturally diverse and locally defined commitments 
we make in realms of life, ranging from avoidance of moral transgres- 
sions to membership in clubs and societies, are the foundation of person- 
hood. One way in which we make sense of each other’s lives—for exam- 
ple, when predicting or interpreting behavior—is through our observa- 
tion of the rights and duties that define social institutions. While technol- 
ogies incorporate designers’ intentions and nature’s affordances, social 
institutions incorporate the rights, duties, and values we attribute to 
others (though, ultimately, they are also grounded in natural causes, such 
as those entailed by biology and physics, and the forces that ultimately 
regiment social institutions through the state’s monopoly on force). 

Technology and social institutions have been central to the extraordi- 
nary upscaling of our modern world. Today, our technologies are capable 
of transporting a 6.5-meter wide sheet of gold one-thousandth the thick- 
ness of a hair to the second Lagrange point, one million miles away from 
Earth; and our social institutions are capable of imposing norms and 
values upon billions of people at a time. 

Our view is that technology and social institutions have the same fun- 
damental enabling substrate: language. But what does it mean to say that 
language underlies modern civilization? The answer is that language 
enables social coordination of a unique kind. We focus here on one cru- 
cial element of this answer: we argue that a foundationally causal factor 
by which language has made both technology and social institutions pos- 
sible is its reflexivity. 

In the following sections, we first define reflexivity in language and 
then explain what it enables and how. 


The reflexive property of language is easily defined: language can be 
used to communicate about itself. This is a unique property among ani- 
mal communication systems, as Charles Hockett (1966: 13) wrote: “Bees 
dance about sites, but they cannot dance about dancing.” That being said, 
there are precedents in animal behavior: Gregory Bateson (1972: 178) 
distinguished metalinguistic messages in which “the subject of discourse 
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is the language” from metacommunicative ones in which “the subject of 
discourse is the relationship between the speakers” (see also Lucy 1993). 
Bateson (1972: 178) further suggested that “there occurs a further class 
of implicit messages about how metacommunicative messages of friend- 
ship and hostility are to be interpreted.” This next level of abstraction is 
illustrated by the famous example of the metamessage “this is play.” 
Bateson reports on a trip to the Fleishhacker Zoo in San Francisco: 


“I saw two young monkeys playing, i.e., engaged in an interactive 
sequence of which the unit actions or signals were similar to but not 
the same as those of combat. It was evident, even to the human 
observer, that the sequence as a whole was not combat, and evident to 
the human observer that to the participant monkeys this was ‘not 


299) 


combat’. 


As he states (Bateson 1972: 180), “The playful nip denotes the bite, 
but it does not denote what would be denoted by the bite.” Bateson’s con- 
clusion is that the evolutionary transition from involuntarily produced 
and recognized “mood signs” to voluntarily controlled and interpreted 
“signals” presupposes a capacity for metacommunication: “Denotative 
communication as it occurs at the human level is only possible after the 
evolution of a complex set of metalinguistic (but not verbalized) rules 
which govern how words and sentences shall be related to objects and 
events” (Bateson 1972: 180). 

It is not clear why linguists seldom contemplate the importance of 
reflexivity in language.!' It could seem trivial or incidental to other cele- 
brated features such as the productivity that a generative morphosyntax 
provides, however, reflexivity is anything but trivial. In regard to this, 
Dorthe Duncker (2019:1) writes: 


“Imagine what it would be like to live in a world without linguistic 
reflexivity — in a world where it was impossible to ask people what 
they meant by what they said, or even to ask them to repeat their last 
utterance; where you could not ask someone about an unknown word, 
and you had no dictionary to look it up, because written language did 
not exist; where it was impossible to tell one person what another per- 
son had just told you; where there were no ways of talking about 
words, questions, meanings, promises, etc., etc.; ... where you did not 
have a name, or any idea about what a name was. ” 


We now explicate some of these ideas. Consider the following sen- 
tence: 


' It is possible that reflexivity can be derived from referentiality/displacement. It 


could conceivably follow from a combination of “openness” and “universality,” 
in Hockett’s terms. 
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When John F. Kennedy said “Ich bin ein Berliner”, he did not mean 
“Tam a jelly doughnut.” 

There are some ways in which this line could not be possible without 
reflexivity in language. First, the personal name “John F. Kennedy” is 
only possible because of a “code about code” function in language: The 
general meaning of a proper name cannot be defined without a reference 
to the code [...] in the code of English, ‘Jerry’ means a person named 
Jerry” (Jakobson 1971: 131). Second, the sentence features quoted 
speech, making it a sentence about another sentence: it is “speech within 
speech” (Jakobson 1971: 130). Third, the words in one language—“Jch 
bin ein Berliner’—can be given a gloss of their possible meaning in 
another language (English: “7 am a jelly doughnut’). This paraphrase 
function is also crucial within a language, especially in language learning 
when we use known words to explain the meanings of unknown ones, a 
process that plays “a vital role in the acquisition and use of language” 
(Jakobson 1971: 131). Fourth, the past-tense marking on finite verbs in 
the sentence (“say” — “said”, “do” — “did’) links language to language 
in the sense that it specifies the connection between the event being 
spoken about (JFK’s speech that took place in 1963) and the current 
speech event (now). Fifth, the semantics of the verbs “say” and “mean” 
cannot be defined without reference to language. And sixth, the word 
“he” in the example is easily understood to refer back to Kennedy, who is 
explicitly named only in the first part of the sentence. Grammatical 
devices such as this case of anaphora link language to language in ways 
that provide coherence and cohesion to texts (Halliday and Hasan 1976), 
allowing us to go beyond mere isolated signals or calls. Without such 
devices for textual cohesion, we would have no way of building narra- 
tives, conversations, instructions, curricula, legislation, or any of the 
other million kinds of text upon which our social reality depends. 

Reflexivity runs deep in human language (Agha 2007; Duncker 2019; 
Lucy 1999; Taylor 2000), as well as in human interaction (Czyzewski 
1994; Enfield and Sidnell 2022; Krippendorff 1989), and it is not found 
in any other animal communication system (Hockett 1966). Our goal 
here is to convince you that the consequences of this are significant and 
that they should be central to any arguments about the evolution of lan- 
guage in our species. 


We argue that the reflexive property of language had at least three rev- 
olutionary consequences for our species; we discuss them in turn. 


i. 
Reflexivity of language enables quoted speech. This mechanism makes 


it possible—as a unique effect of language among animal communica- 
tion systems—to separate the “animator” of a signal, i.e., the one who 
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physically produces the signal that is perceived, from its “author” and 
“principal”, i.e., the ones who compose and take responsibility for its 
content (Goffman 1981). In turn, this means that one person’s message 
can reach another person without the need for the two people to be in 
each other’s presence. This arguably brought about the most significant 
information revolution in the history of our species. It enables a message 
to reach a recipient who is away from the sender of that message, which 
in turn has the effect of compressing space and time in social networks 
and opening the door to potentially infinite upscaling of coordination 
through communication. We speculate that every information revolution 
since—including writing, printing, mass media, and the Internet—has 
been a quantitative upscaling based on this singular qualitative advance. 

The possibility of quoted speech plays an important role in reputation 
management, a foundational aspect of human society (Dunbar 1996; 
Emler 1990; Haviland 1977). This can occur at the personal level, relat- 
ing, for example, to remarks about specific things that a person has said, 
with evaluations of whether those things are good or bad. Or it can occur 
at group levels, for example, in the sociolinguistics of group-identity 
badges such as regional origin, level of education, socioeconomic back- 
ground, age, and racial or ethnic identity. It is not just that these social 
distinctions are indexed by differences in linguistic behavior, but that the 
reflexive nature of language allows us to thematize and characterize 
those differences, focusing joint attention on them and coordinating 
around them for social purposes. Arguably, all social accountability is 
grounded in these reflexive elements of language. All social accountabil- 
ity requires that we can jointly and publicly characterize, and thus coordi- 
nate around, the behavior that is being held to account or judged (whether 
for praise or blame). That public characterization and coordination would 
be impossible without language. 


Reflexivity of language enables the building of texts. Texts, in the broad- 
est sense of that term, facilitate the communication, codification, and 
coordination that underpin cumulative culture (including science) and 
institutional structure. At base, language comes in small units, is infi- 
nitely flexible, and is highly unpredictable. There needs to be glue 
between its units if we are to go beyond mere isolated calls, and that glue 
depends on language-directed bits of language, such as—to return to the 
JFK/Berliner example—the pronoun ‘he’ (linking back to Kennedy) and 
the conjunction ‘when’ (linking the two clauses together). 

Linguistic tools for textual cohesion are well-studied (Chafe 1980; 
Foley and Van Valin Jr 1984; Gernsbacher and Givon 1995; Givon 1990; 
Halliday and Hasan 1976). Grammatical forms such as pronouns provide 
such glue as part of the machinery for marking reference-tracking across 
lengthy discourses (Foley and Van Valin Jr 1984; Gernsbacher and Givon 
1995). Pronouns are often derived historically from demonstrative ele- 
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ments (e.g., “this” or “thaf’) whose core function is to link language to 
the immediate context (Dixon 2003; Fillmore 1997; Hyslop (Malau) 
1993; Levinson et al. 2018; Talmy 2017); and indeed, these are ulti- 
mately derived from the deictic pointing gestures that are prerequisite to 
language (Enfield and Sidnell 2022; Tomasello 2006). Through historical 
processes of grammaticalization (Hurford 2012; Traugott and Heine 
1991), speakers of a language are provided with complex means by 
which devices in the language can be used for organizing the language 
itself into higher-order structures. This applies not only to extended texts 
such as narratives, recipes, or legal documents, but also to conversational 
structures such as turn-taking, sequence organization, and repair (Enfield 
2017; Levinson 1983; Sidnell and Stivers 2013). 


Reflexivity of language enables social accountability. This is because 
social accountability is a defining element of social reality, in the sense 
championed by the philosophers of language John Austin (1962) and 
John Searle (1995, 2010). Social reality refers to facts that are created by 
virtue of speech acts of a kind known as status function declarations. An 
example is when a marriage celebrant declares that a couple are now 
legally married. The words become true by means of the speech act, 
which in turn creates new rights and duties for the married couple, that 
is, accountability to their new normative and legal status. These declara- 
tions work when (a) it is publicly agreed that the declaration has been 
properly made—.e., that there is a joint commitment among relevant 
parties—and this can only happen if people’s joint attention is on the act 
of declaration itself; and (b) it is possible to refer back to the declaration 
later if it is necessary to invoke the rights and duties created by the speech 
act, such as when holding someone to account for transgressive behavior. 
An example is when sanctioning someone on the basis that they are mar- 
ried, either legally (as when prosecuting them for bigamy) or morally (as 
when condemning them for adultery). This is the possibility of “rebuke” 
that the philosopher Margaret Gilbert insisted was definitive of joint 
action. In her example, when two people are engaged in the joint action 
of going for a walk together, then it is possible for one to rebuke the other 
for not doing their part, for example, by saying “You are going to have to 
slow down! I can’t keep up with you” (Gilbert 1990: 3). 

As Searle (2010) writes, “You make it the case that you promise by 
saying, ‘I promise’.” Less obviously perhaps, you make it the case that 
we are walking together by saying “Let's go for a walk.” This is why 
social reality is linguistic in nature. But it is also necessarily meta-lin- 
guistic in nature. Why? Because a promise brings with it a form of 
accountability—the possibility of rebuke—that is also activated through 
language: “But you promised!” Or in the case of bigamy, legal accounta- 
bility would require linguistic reference to a linguistic act—language 
about language—in the form of a citation of the linguistically-created 
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fact that the first couple are in fact married. All social reality is thus not 
only a linguistic matter in that it was created by language, as Searle 
(2010) argues, it is also a metalinguistic matter. 

Sometimes it seems as if social realities can be established without 
language, as when a row of stones marks a territorial boundary. It can be 
argued, though, that language is always involved, first because such 
things as border-marking stones are arbitrary symbols and thus proxies 
for language, and second because the reality-creating power of the 
markers is only possible because of the ever-present possibility of talking 
about it, as when, for example, one states, “The land on this side of the 
line is mine.” In most animals, such territorial disputes are managed 
solely in the realm of brute reality, while in humans we can first draw 
attention (using language) to the established rights and duties, tending 
only later to resort to force. The key point is that any time the relevant 
rights and duties are invoked in matters relating to social accountability, 
such invoking is also done through language, indeed, through language 
about language. 


IV 


In the above sections, we have explored the implications of the fact that 
human language has the unique capacity to be directed toward itself. We 
have pointed to some remarkable consequences of this. But there is more. 
There is the idea that “without ‘second-order’, reflexive properties, ‘first 
order’ language itself could not exist” (Taylor 2000: 483). How could this 
paradoxical statement be true? 

The first step is to explain what it means for “first order” language to 
be dependent on “second-order” reflexive properties. The second step is 
to speculate as to how language could emerge at all if its ‘second-order’ 
properties had to be already present. We now take these points in turn. 


Let us explain how the existence of the building blocks of language 
depends on our communications about those building blocks. 

One type of building block is the utterance or move, the one or two 
second burst of noise (Chafe 2018) that captures a basic unit of proposi- 
tional content and that serves as a brick for building sequences of interac- 
tion (Bavelas 2022; Enfield 2013; Enfield and Sidnell 2022; Schegloff 
1968, 2007; Sidnell and Stivers 2013). Research on language in social 
interaction has found that with every move in the progression of an inter- 
action, the participants in that interaction must effectively register their 
continuing participation in relation to what is said, indicating their atten- 
tion, understanding, uptake, satisfaction, confusion, or failure in relation 
to (at least) the last move made (Goffman 1963, 1964). Clark (1996) 
argues that every move made in a conversation carries an implicit reflex- 
ive message: “Have you heard and understood adequately so far?” If the 
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answer is “no”, this can be signaled using the linguistic system of other- 
initiated repair, by which we use expressions like Huh?, What?, or Who? 
to convey that there is some problem with hearing or understanding 
(Dingemanse and Enfield 2015; Dingemanse et al. 2015; Schegloff, Jef- 
ferson, and Sacks 1977). Conversely, if we want to explicitly indicate 
that we do not have any such problem, we can use explicit continuers like 
uh-huh, which give the all-clear for the conversation to continue (Sche- 
gloff 1982). And in between those possibilities is the numerically dom- 
inant behavior of neither initiating repair nor explicitly indicating that all 
has been adequately heard and understood (Robinson 2014). But even 
this “zero” strategy is ultimately metalinguistic, in a secondary sense: 
every time a person foregoes the ever-present opportunity to initiate 
repair, indicating some problem of hearing or understanding, by the prin- 
ciple of antithesis (Darwin 1872: 28), they effectively affirm that the 
interaction remains on track. 

These same observations apply mutatis mutandis to another kind of 
linguistic building block: words and other minimal units of conventional- 
ized meaning. At any point in an interaction, the aptness of a particular 
word may be challenged or negotiated in a variety of ways in an immedi- 
ate subsequent next turn at talk (Enfield and Sidnell 2022; Sidnell 2014; 
Sidnell and Barnes 2012). For instance, in one case, when a first speaker 
says, “Your anger and your hate, I think is coming off as erratic” the other 
corrects with, “My passion” thereby refusing to accept “anger and hate” 
as an adequate or accurate description of his emotional state and beha- 
vior. In another case, when a first speaker says “You stuck it in the spon- 
sorship budget” the other responds with “Pardon?” and the first speaker 
then reformulates the utterance as “You decided ... to place these pur- 
chases in the sponsorship account, correct?” As these examples and many 
others like them demonstrate, speakers are held accountable for the con- 
ventionalized meanings they convey by virtue of word choice. Behaviors 
such as correction and other-initiation of repair are reflexive operations 
of language that serve to police the appropriateness of fit between lan- 
guage and context. This affects the circulation of both the linguistic items 
and the norms of accountability around their usage. And as with moves in 
sequences of interaction, when people choose not to query the appropri- 
ateness of a word in a given context, they are tacitly accepting it and 
endorsing its usage as acceptable. This is only possible in so far as lan- 
guage can be turned in on itself: this reflexive capacity of language 
allows it to serve simultaneously as the instrument and object of account- 
ability. 


Thus, we have seen that units at the most basic, “first-order” level— 
including the words we use as bricks for building moves and the moves 
we use as bricks for building discourses—are grounded in, and shaped 
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by, “second-order” regulatory metalinguistic actions and interpretations 
that could not exist without the reflexive quality of language. But if the 
“second-order” behavior is a prerequisite for the “first order” structures, 
how could the core system have evolved in the first place? We now 
briefly speculate about this riddle of reflexivity and the origins of lan- 
guage. 

We think that a solution to the riddle can be found in the organization 
of repair, a universal of human language (Dingemanse and Enfield 2015; 
Dingemanse et al. 2015; Hayashi, Raymond, and Sidnell 2013; Sche- 
gloff, Jefferson, and Sacks 1977). It has been discovered that Huh? is a 
word found in every language for which there is data (Dingemanse, Tor- 
reira, and Enfield 2013). We have speculated elsewhere that Huh? is “as 
close as you can get to the core of the human faculty for language” 
(Enfield 2017: 207). It is the most simple, general, omnirelevant, all-pur- 
pose tool for linguistically-oriented accountability. Could Huh? have 
been the first word? We can imagine a stage when early humans were 
able to draw attention to some element of an action-response sequence, 
producing an overt interjection of puzzlement (“Huh?’’) in a structural 
slot where a behavioral response was expected to have occurred in the 
interactional sequence. The interjection would be metacommunicative in 
so far as it is about the communicative action just made by another agent. 
Then imagine that this puzzled, meta-communicative response promotes 
the redoing of the gesture. In turn, the link between the interjection of 
puzzlement and the re-doing of the not-yet-understood gesture could 
become ritualized and so may be used repeatedly to bring about the same 
end—.e., to elicit a repetition of the previous communicative move— 
either by this proto-human or by another who happened to witness the 
events unfold. We can imagine here a shift from learning via ontogenetic 
ritualization to imitation, i.e., conventional transmission, the beating 
heart of language and culture. 

Thus, we can imagine other-initiated repair as a wedge that intro- 
duces metalanguage into a not-yet-linguistic system. It would be commu- 
nication about communication, which in turn could provide mechanisms 
of the kind discussed above, providing a context for the cultural evolu- 
tion of the semantic/referential elements of language. 


V 


We have considered some far-reaching implications that the reflexive 
quality of language has had for human social coordination and its pro- 
ducts, from technology to institutions and to language itself, in doing so, 
we have encountered a riddle that any theory of the evolution of language 
must answer: if reflexivity is definitive of language, how could it have 
arisen in the first place? Following Taylor (2000), we framed reflexivity 
as a “second-order” feature; but how could the second thing come first? 
The solution is to invert the understanding of “first-order” and “second- 
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order” in this equation. Thus, our argument supports the following con- 
clusion: the primary functions of language are to regulate and facilitate 
social coordination, while the semantic functions by which language is 
able to encode information and achieve semantic reference are secondary 
and derivative. 
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